Safety Section

The safety section provides an overview of content risks present in the dataset and any mitigations applied. This helps users assess whether the dataset is appropriate for their use case and what safeguards they may need.

Fields

All fields in the safety section are optional.

FieldTypeDescription
content_risk_levelenumOverall characterization of harmful content presence
known_risky_categoriesarrayCategories of potentially harmful content
mitigationsstringDescription of filtering or mitigation steps

Enum Values

content_risk_level

ValueDescription
lowMinimal harmful content, suitable for general use
mediumSome harmful content present, may need downstream filtering
highSignificant harmful content, requires careful handling

known_risky_categories

ValueDescription
hateHate speech or discriminatory content
harassmentHarassment or bullying content
sexualSexual or adult content
violenceViolent or graphic content
self-harmSelf-harm or suicide-related content
illicit-drugsDrug-related content
extremismExtremist or radicalization content
otherOther risky content categories

Example

{
  "safety": {
    "content_risk_level": "medium",
    "known_risky_categories": ["hate", "violence"],
    "mitigations": "Applied perspective API toxicity filter with threshold 0.7. Removed examples flagged as severely toxic. Some borderline content remains for research purposes. Recommend additional filtering for production use."
  }
}

Field Details

content_risk_level

Choose the level that best characterizes your dataset:

LevelTypical SourcesRecommended Use
lowCurated content, educational materials, professional textGeneral purpose, production
mediumWeb scrapes, social media, newsResearch, with downstream filtering
highUnfiltered web, forums, commentsResearch only, strong safeguards required

known_risky_categories

List all categories present, even if filtered:

{
  "known_risky_categories": ["hate", "harassment", "sexual", "violence"]
}

This helps users:

  • Understand what types of content may remain
  • Plan appropriate downstream filtering
  • Assess suitability for their application

mitigations

Document what you did to reduce risks:

{
  "mitigations": "Three-stage filtering: (1) Keyword blocklist for explicit content, (2) Perspective API with toxicity threshold 0.8 and severe_toxicity threshold 0.5, (3) Manual review of 1% sample. Estimated 99% of severely toxic content removed. Borderline cases retained with 'potentially_sensitive' flag in metadata."
}

Include:

  • Tools and thresholds used
  • Effectiveness estimates
  • What remains after filtering
  • Recommendations for users

Risk Assessment Checklist

When assessing safety, consider:

  1. Source risk - Where does the data come from?

    • Curated sources: lower risk
    • Open web: higher risk
    • User-generated content: highest risk
  2. Domain risk - What topics are covered?

    • News/politics: may contain conflict, hate speech
    • Social media: harassment, toxicity
    • Forums: variable risk by community
  3. Temporal risk - When was it collected?

    • During conflicts/elections: higher political toxicity
    • Recent events: may contain misinformation
  4. Language risk - Some languages have less tool support

    • Toxicity classifiers may underperform
    • Manual review may be necessary

Safety for Low-Resource Languages

{
  "safety": {
    "content_risk_level": "medium",
    "known_risky_categories": ["hate", "other"],
    "mitigations": "English content filtered with Perspective API. Hausa content reviewed manually by native speakers (5% sample). Toxicity classifier not available for Hausa - relying on keyword lists and manual review. Higher residual risk in Hausa subset."
  }
}

Downstream Recommendations

Include guidance for users in your documentation:

## Safety Recommendations

This dataset has been filtered but may contain:

- Implicit bias in news reporting
- Occasional offensive language in quotes
- References to violence in conflict reporting

For production use, we recommend:

1. Apply additional toxicity filtering
2. Use content moderation on model outputs
3. Test for demographic biases before deployment

See Also