Use Section

The use section documents the intended uses for the dataset and explicitly calls out uses that are out of scope. This helps users determine if the dataset is appropriate for their application.

Fields

All fields in the use section are optional.

FieldTypeDescription
intended_usesarrayIntended uses for the dataset
out_of_scope_usesarrayUses that are explicitly out of scope

Example

{
  "use": {
    "intended_uses": [
      "Pretraining multilingual language models",
      "Fine-tuning for Hausa language tasks",
      "Research on low-resource language NLP",
      "Development of Hausa language tools and applications",
      "Academic study of Hausa linguistics and media"
    ],
    "out_of_scope_uses": [
      "Training models for surveillance or tracking",
      "Generating misinformation or fake news",
      "Profiling individuals based on language patterns",
      "Commercial applications without proper attribution",
      "Training models to impersonate specific journalists or public figures"
    ]
  }
}

Field Details

intended_uses

List the primary purposes this dataset was created for:

{
  "intended_uses": [
    "Pretraining language models",
    "Supervised fine-tuning for text classification",
    "Evaluation of model performance on African languages",
    "Research on multilingual transfer learning"
  ]
}

Categories of intended uses:

  • Training: Pretraining, fine-tuning, distillation
  • Evaluation: Benchmarking, testing, validation
  • Research: Academic study, linguistic analysis
  • Applications: Translation, summarization, Q&A
  • Tools: Spell checkers, grammar tools, keyboards

out_of_scope_uses

Explicitly document uses you don’t endorse:

{
  "out_of_scope_uses": [
    "High-stakes decision making without human oversight",
    "Generation of synthetic media impersonating real people",
    "Training systems for autonomous weapons",
    "Mass surveillance applications",
    "Creating misleading or deceptive content"
  ]
}

Use vs. Rights

The use section documents intent, while rights documents legal restrictions:

FieldPurposeEnforcement
use.intended_usesWhat the dataset was designed forGuidance only
use.out_of_scope_usesWhat you don’t recommendGuidance only
rights.allowed_usesExplicitly permitted usesMay be legally binding
rights.restricted_usesProhibited usesLegally binding (per license)

Example showing both:

{
  "rights": {
    "license": "CC-BY-NC-4.0",
    "allows_commercial_use": false,
    "restricted_uses": ["commercial-model-training"]
  },
  "use": {
    "intended_uses": ["Academic research on low-resource NLP", "Educational language learning tools"],
    "out_of_scope_uses": ["Production systems without evaluation", "High-stakes medical or legal applications"]
  }
}

Common Intended Uses

Pretraining

{
  "intended_uses": [
    "Pretraining multilingual language models",
    "Continued pretraining for domain adaptation",
    "Training tokenizers and vocabularies"
  ]
}

Fine-tuning

{
  "intended_uses": [
    "Supervised fine-tuning for classification",
    "Instruction tuning for chat models",
    "Task-specific adaptation"
  ]
}

Evaluation

{
  "intended_uses": ["Benchmarking model performance", "Cross-lingual evaluation", "Bias and fairness testing"]
}

Research

{
  "intended_uses": ["Linguistic analysis", "Corpus linguistics research", "Computational social science"]
}

Common Out-of-Scope Uses

Safety Concerns

{
  "out_of_scope_uses": [
    "Generating harmful or illegal content",
    "Creating deepfakes or synthetic media",
    "Automated harassment or abuse"
  ]
}

Quality Limitations

{
  "out_of_scope_uses": [
    "Medical diagnosis or treatment recommendations",
    "Legal advice generation",
    "Critical infrastructure control"
  ]
}

Data Limitations

{
  "out_of_scope_uses": [
    "Tasks requiring 2024+ knowledge (data ends 2023)",
    "Formal/legal document generation (informal data)",
    "Cross-dialect applications (single dialect only)"
  ]
}

See Also