Rights Section

The rights section declares the legal framework for using your dataset, including licensing terms, commercial use permissions, and personal data status. This information helps users determine if they can legally use the dataset for their intended purposes.

Required Fields

FieldTypeDescription
licensestringLicense identifier (SPDX ID preferred, or human-readable string)
allows_commercial_usebooleanWhether the dataset may be used for commercial model training
contains_personal_dataenumHigh-level characterization of personal data presence

Optional Fields

FieldTypeDescription
license_urlstring (uri)Canonical URL for the license text
attribution_requiredbooleanWhether downstream users must provide attribution
child_databooleanWhether the dataset intentionally includes data about children
allowed_usesarraySpecific uses that are explicitly allowed
restricted_usesarraySpecific uses that are restricted or prohibited
consent_mechanismstringSummary of consent or legal basis for data collection

Enum Values

contains_personal_data

ValueDescription
noneNo personal data present
de_minimisMinimal personal data (e.g., incidental mentions of public figures)
pseudonymousPersonal data present but pseudonymized or anonymized
directDirect personal identifiers present

Conditional Rules

When…You must provide…
contains_personal_data is “de_minimis”, “pseudonymous”, or “direct”consent_mechanism
child_data is trueconsent_mechanism

Example

{
  "rights": {
    "license": "CC-BY-4.0",
    "license_url": "https://creativecommons.org/licenses/by/4.0/",
    "attribution_required": true,
    "allows_commercial_use": true,
    "contains_personal_data": "pseudonymous",
    "consent_mechanism": "explicit-opt-in",
    "child_data": false,
    "allowed_uses": ["training-foundation-models", "research", "evaluation"],
    "restricted_uses": ["biometric-identification", "surveillance"]
  }
}

Field Details

license

Use SPDX license identifiers when possible for interoperability. Common choices include:

  • CC-BY-4.0 - Creative Commons Attribution 4.0
  • CC-BY-SA-4.0 - Creative Commons Attribution-ShareAlike 4.0
  • CC-BY-NC-4.0 - Creative Commons Attribution-NonCommercial 4.0
  • Apache-2.0 - Apache License 2.0
  • MIT - MIT License
  • CC0-1.0 - Public Domain Dedication

If using a custom license, provide a descriptive name and include the license_url.

contains_personal_data

Be honest about personal data status. This affects:

  • Whether users need to implement additional safeguards
  • Whether additional consent documentation is required
  • Potential legal obligations for downstream users

Examples of valid consent mechanisms:

  • explicit-opt-in - Users explicitly agreed to data collection
  • terms-of-service - Data collected under accepted ToS
  • public-domain - Data is from public domain sources
  • licensed - Data licensed from content creators
  • community-consent - Community representatives approved collection

allowed_uses and restricted_uses

Be specific about what users can and cannot do with the data. This helps users self-assess compatibility before downloading.

{
  "allowed_uses": ["training-foundation-models", "research", "evaluation", "non-profit-applications"],
  "restricted_uses": ["biometric-identification", "surveillance", "generating-misinformation", "military-applications"]
}

See Also