Rights Section

The rights section declares the legal framework for using your dataset, including licensing terms, commercial use permissions, and personal data status. This information helps users determine if they can legally use the dataset for their intended purposes.

Required Fields

FieldTypeDescription
licensestringLicense identifier (SPDX ID preferred, or human-readable string)
allows_commercial_usebooleanWhether the dataset may be used for commercial model training
contains_personal_dataenumHigh-level characterization of personal data presence

Optional Fields

FieldTypeDescription
license_urlstring (uri)Canonical URL for the license text
attribution_requiredbooleanWhether downstream users must provide attribution
child_databooleanWhether the dataset intentionally includes data about children
allowed_usesarraySpecific uses that are explicitly allowed
restricted_usesarraySpecific uses that are restricted or prohibited
legal_basisenumLegal basis for collecting/processing personal data (required when personal or child data)
legal_basis_notesstringExplanation when legal_basis is other, or to add jurisdiction-specific detail

Enum Values

contains_personal_data

ValueDescription
noneNo personal data present
de_minimisMinimal personal data (e.g., incidental mentions of public figures)
pseudonymousPersonal data present but pseudonymized or anonymized
directDirect personal identifiers present
ValueDescription
explicit-consentData subjects explicitly consented
terms-of-serviceCollected under accepted terms of service
public-domainSourced from public domain materials
licensedRights obtained via license agreement
legitimate-interestProcessed under legitimate interest (jurisdictional)
research-exemptionCovered by research/academic exemption
contractual-necessityRequired to perform a contract
publicly-availablePublicly accessible with no expectation of privacy
otherAnother documented legal basis (requires legal_basis_notes)

Conditional Rules

When…You must provide…
contains_personal_data is “de_minimis”, “pseudonymous”, or “direct”legal_basis
child_data is truelegal_basis
legal_basis is “other”legal_basis_notes

Example

{
  "rights": {
    "license": "CC-BY-4.0",
    "license_url": "https://creativecommons.org/licenses/by/4.0/",
    "attribution_required": true,
    "allows_commercial_use": true,
    "contains_personal_data": "pseudonymous",
    "legal_basis": "explicit-consent",
    "legal_basis_notes": "Participants provided written consent for research and model training uses.",
    "child_data": false,
    "allowed_uses": ["training-foundation-models", "research", "evaluation"],
    "restricted_uses": ["biometric-identification", "surveillance"]
  }
}

Field Details

license

Use SPDX license identifiers when possible for interoperability. Common choices include:

  • CC-BY-4.0 - Creative Commons Attribution 4.0
  • CC-BY-SA-4.0 - Creative Commons Attribution-ShareAlike 4.0
  • CC-BY-NC-4.0 - Creative Commons Attribution-NonCommercial 4.0
  • Apache-2.0 - Apache License 2.0
  • MIT - MIT License
  • CC0-1.0 - Public Domain Dedication

If using a custom license, provide a descriptive name and include the license_url.

contains_personal_data

Be honest about personal data status. This affects:

  • Whether users need to implement additional safeguards
  • Whether additional legal basis documentation is required
  • Potential legal obligations for downstream users

Choose the lawful basis that covers your collection and use of personal data. Match the value to the strongest justification you can document. If none of the enumerated values apply, use other and explain in legal_basis_notes.

allowed_uses and restricted_uses

Be specific about what users can and cannot do with the data. This helps users self-assess compatibility before downloading.

{
  "allowed_uses": ["training-foundation-models", "research", "evaluation", "non-profit-applications"],
  "restricted_uses": ["biometric-identification", "surveillance", "generating-misinformation", "military-applications"]
}

See Also