Rights Section
The rights section declares the legal framework for using your dataset, including licensing terms, commercial use permissions, and personal data status. This information helps users determine if they can legally use the dataset for their intended purposes.
Required Fields
| Field | Type | Description |
|---|---|---|
license | string | License identifier (SPDX ID preferred, or human-readable string) |
allows_commercial_use | boolean | Whether the dataset may be used for commercial model training |
contains_personal_data | enum | High-level characterization of personal data presence |
Optional Fields
| Field | Type | Description |
|---|---|---|
license_url | string (uri) | Canonical URL for the license text |
attribution_required | boolean | Whether downstream users must provide attribution |
child_data | boolean | Whether the dataset intentionally includes data about children |
allowed_uses | array | Specific uses that are explicitly allowed |
restricted_uses | array | Specific uses that are restricted or prohibited |
consent_mechanism | string | Summary of consent or legal basis for data collection |
Enum Values
contains_personal_data
| Value | Description |
|---|---|
none | No personal data present |
de_minimis | Minimal personal data (e.g., incidental mentions of public figures) |
pseudonymous | Personal data present but pseudonymized or anonymized |
direct | Direct personal identifiers present |
Conditional Rules
| When… | You must provide… |
|---|---|
contains_personal_data is “de_minimis”, “pseudonymous”, or “direct” | consent_mechanism |
child_data is true | consent_mechanism |
Example
{
"rights": {
"license": "CC-BY-4.0",
"license_url": "https://creativecommons.org/licenses/by/4.0/",
"attribution_required": true,
"allows_commercial_use": true,
"contains_personal_data": "pseudonymous",
"consent_mechanism": "explicit-opt-in",
"child_data": false,
"allowed_uses": ["training-foundation-models", "research", "evaluation"],
"restricted_uses": ["biometric-identification", "surveillance"]
}
}
Field Details
license
Use SPDX license identifiers when possible for interoperability. Common choices include:
CC-BY-4.0- Creative Commons Attribution 4.0CC-BY-SA-4.0- Creative Commons Attribution-ShareAlike 4.0CC-BY-NC-4.0- Creative Commons Attribution-NonCommercial 4.0Apache-2.0- Apache License 2.0MIT- MIT LicenseCC0-1.0- Public Domain Dedication
If using a custom license, provide a descriptive name and include the license_url.
contains_personal_data
Be honest about personal data status. This affects:
- Whether users need to implement additional safeguards
- Whether additional consent documentation is required
- Potential legal obligations for downstream users
consent_mechanism
Examples of valid consent mechanisms:
explicit-opt-in- Users explicitly agreed to data collectionterms-of-service- Data collected under accepted ToSpublic-domain- Data is from public domain sourceslicensed- Data licensed from content creatorscommunity-consent- Community representatives approved collection
allowed_uses and restricted_uses
Be specific about what users can and cannot do with the data. This helps users self-assess compatibility before downloading.
{
"allowed_uses": ["training-foundation-models", "research", "evaluation", "non-profit-applications"],
"restricted_uses": ["biometric-identification", "surveillance", "generating-misinformation", "military-applications"]
}
See Also
- Access Section - How to obtain the dataset
- Provenance Section - Data origin and collection details
- PII & Consent Guide - Detailed guidance on personal data handling