Rights Section
The rights section declares the legal framework for using your dataset, including licensing terms, commercial use permissions, and personal data status. This information helps users determine if they can legally use the dataset for their intended purposes.
Required Fields
| Field | Type | Description |
|---|---|---|
license | string | License identifier (SPDX ID preferred, or human-readable string) |
allows_commercial_use | boolean | Whether the dataset may be used for commercial model training |
contains_personal_data | enum | High-level characterization of personal data presence |
Optional Fields
| Field | Type | Description |
|---|---|---|
license_url | string (uri) | Canonical URL for the license text |
attribution_required | boolean | Whether downstream users must provide attribution |
child_data | boolean | Whether the dataset intentionally includes data about children |
allowed_uses | array | Specific uses that are explicitly allowed |
restricted_uses | array | Specific uses that are restricted or prohibited |
legal_basis | enum | Legal basis for collecting/processing personal data (required when personal or child data) |
legal_basis_notes | string | Explanation when legal_basis is other, or to add jurisdiction-specific detail |
Enum Values
contains_personal_data
| Value | Description |
|---|---|
none | No personal data present |
de_minimis | Minimal personal data (e.g., incidental mentions of public figures) |
pseudonymous | Personal data present but pseudonymized or anonymized |
direct | Direct personal identifiers present |
legal_basis
| Value | Description |
|---|---|
explicit-consent | Data subjects explicitly consented |
terms-of-service | Collected under accepted terms of service |
public-domain | Sourced from public domain materials |
licensed | Rights obtained via license agreement |
legitimate-interest | Processed under legitimate interest (jurisdictional) |
research-exemption | Covered by research/academic exemption |
contractual-necessity | Required to perform a contract |
publicly-available | Publicly accessible with no expectation of privacy |
other | Another documented legal basis (requires legal_basis_notes) |
Conditional Rules
| When… | You must provide… |
|---|---|
contains_personal_data is “de_minimis”, “pseudonymous”, or “direct” | legal_basis |
child_data is true | legal_basis |
legal_basis is “other” | legal_basis_notes |
Example
{
"rights": {
"license": "CC-BY-4.0",
"license_url": "https://creativecommons.org/licenses/by/4.0/",
"attribution_required": true,
"allows_commercial_use": true,
"contains_personal_data": "pseudonymous",
"legal_basis": "explicit-consent",
"legal_basis_notes": "Participants provided written consent for research and model training uses.",
"child_data": false,
"allowed_uses": ["training-foundation-models", "research", "evaluation"],
"restricted_uses": ["biometric-identification", "surveillance"]
}
}
Field Details
license
Use SPDX license identifiers when possible for interoperability. Common choices include:
CC-BY-4.0- Creative Commons Attribution 4.0CC-BY-SA-4.0- Creative Commons Attribution-ShareAlike 4.0CC-BY-NC-4.0- Creative Commons Attribution-NonCommercial 4.0Apache-2.0- Apache License 2.0MIT- MIT LicenseCC0-1.0- Public Domain Dedication
If using a custom license, provide a descriptive name and include the license_url.
contains_personal_data
Be honest about personal data status. This affects:
- Whether users need to implement additional safeguards
- Whether additional legal basis documentation is required
- Potential legal obligations for downstream users
legal_basis
Choose the lawful basis that covers your collection and use of personal data. Match the value to the strongest
justification you can document. If none of the enumerated values apply, use other and explain in legal_basis_notes.
allowed_uses and restricted_uses
Be specific about what users can and cannot do with the data. This helps users self-assess compatibility before downloading.
{
"allowed_uses": ["training-foundation-models", "research", "evaluation", "non-profit-applications"],
"restricted_uses": ["biometric-identification", "surveillance", "generating-misinformation", "military-applications"]
}
See Also
- Access Section - How to obtain the dataset
- Provenance Section - Data origin and collection details
- Personal Data & Legal Basis Guide - Detailed guidance on personal data handling