Use Section
The use section documents the intended uses for the dataset and explicitly calls out uses that are out of scope. This helps users determine if the dataset is appropriate for their application.
Fields
All fields in the use section are optional.
| Field | Type | Description |
|---|---|---|
intended_uses | array | Intended uses for the dataset |
out_of_scope_uses | array | Uses that are explicitly out of scope |
Example
{
"use": {
"intended_uses": [
"Pretraining multilingual language models",
"Fine-tuning for Hausa language tasks",
"Research on low-resource language NLP",
"Development of Hausa language tools and applications",
"Academic study of Hausa linguistics and media"
],
"out_of_scope_uses": [
"Training models for surveillance or tracking",
"Generating misinformation or fake news",
"Profiling individuals based on language patterns",
"Commercial applications without proper attribution",
"Training models to impersonate specific journalists or public figures"
]
}
}
Field Details
intended_uses
List the primary purposes this dataset was created for:
{
"intended_uses": [
"Pretraining language models",
"Supervised fine-tuning for text classification",
"Evaluation of model performance on African languages",
"Research on multilingual transfer learning"
]
}
Categories of intended uses:
- Training: Pretraining, fine-tuning, distillation
- Evaluation: Benchmarking, testing, validation
- Research: Academic study, linguistic analysis
- Applications: Translation, summarization, Q&A
- Tools: Spell checkers, grammar tools, keyboards
out_of_scope_uses
Explicitly document uses you don’t endorse:
{
"out_of_scope_uses": [
"High-stakes decision making without human oversight",
"Generation of synthetic media impersonating real people",
"Training systems for autonomous weapons",
"Mass surveillance applications",
"Creating misleading or deceptive content"
]
}
Use vs. Rights
The use section documents intent, while rights documents legal restrictions:
| Field | Purpose | Enforcement |
|---|---|---|
use.intended_uses | What the dataset was designed for | Guidance only |
use.out_of_scope_uses | What you don’t recommend | Guidance only |
rights.allowed_uses | Explicitly permitted uses | May be legally binding |
rights.restricted_uses | Prohibited uses | Legally binding (per license) |
Example showing both:
{
"rights": {
"license": "CC-BY-NC-4.0",
"allows_commercial_use": false,
"restricted_uses": ["commercial-model-training"]
},
"use": {
"intended_uses": ["Academic research on low-resource NLP", "Educational language learning tools"],
"out_of_scope_uses": ["Production systems without evaluation", "High-stakes medical or legal applications"]
}
}
Common Intended Uses
Pretraining
{
"intended_uses": [
"Pretraining multilingual language models",
"Continued pretraining for domain adaptation",
"Training tokenizers and vocabularies"
]
}
Fine-tuning
{
"intended_uses": [
"Supervised fine-tuning for classification",
"Instruction tuning for chat models",
"Task-specific adaptation"
]
}
Evaluation
{
"intended_uses": ["Benchmarking model performance", "Cross-lingual evaluation", "Bias and fairness testing"]
}
Research
{
"intended_uses": ["Linguistic analysis", "Corpus linguistics research", "Computational social science"]
}
Common Out-of-Scope Uses
Safety Concerns
{
"out_of_scope_uses": [
"Generating harmful or illegal content",
"Creating deepfakes or synthetic media",
"Automated harassment or abuse"
]
}
Quality Limitations
{
"out_of_scope_uses": [
"Medical diagnosis or treatment recommendations",
"Legal advice generation",
"Critical infrastructure control"
]
}
Data Limitations
{
"out_of_scope_uses": [
"Tasks requiring 2024+ knowledge (data ends 2023)",
"Formal/legal document generation (informal data)",
"Cross-dialect applications (single dialect only)"
]
}
See Also
- Rights Section - Legal restrictions on use
- Safety Section - Content risk information
- Quality Section - Known limitations