Safety Section

The safety section provides an overview of content risks present in the dataset and any mitigations applied. This helps users assess whether the dataset is appropriate for their use case and what safeguards they may need.

Fields

All fields in the safety section are optional.

Field	Type	Description
`content_risk_level`	enum	Overall characterization of harmful content presence
`known_risky_categories`	array	Categories of potentially harmful content
`mitigations`	string	Description of filtering or mitigation steps

Enum Values

content_risk_level

Value	Description
`low`	Minimal harmful content, suitable for general use
`medium`	Some harmful content present, may need downstream filtering
`high`	Significant harmful content, requires careful handling

known_risky_categories

Value	Description
`hate`	Hate speech or discriminatory content
`harassment`	Harassment or bullying content
`sexual`	Sexual or adult content
`violence`	Violent or graphic content
`self-harm`	Self-harm or suicide-related content
`illicit-drugs`	Drug-related content
`extremism`	Extremist or radicalization content
`other`	Other risky content categories

Example

{
  "safety": {
    "content_risk_level": "medium",
    "known_risky_categories": ["hate", "violence"],
    "mitigations": "Applied perspective API toxicity filter with threshold 0.7. Removed examples flagged as severely toxic. Some borderline content remains for research purposes. Recommend additional filtering for production use."
  }
}

Field Details

content_risk_level

Choose the level that best characterizes your dataset:

Level	Typical Sources	Recommended Use
`low`	Curated content, educational materials, professional text	General purpose, production
`medium`	Web scrapes, social media, news	Research, with downstream filtering
`high`	Unfiltered web, forums, comments	Research only, strong safeguards required

known_risky_categories

List all categories present, even if filtered:

{
  "known_risky_categories": ["hate", "harassment", "sexual", "violence"]
}

This helps users:

Understand what types of content may remain
Plan appropriate downstream filtering
Assess suitability for their application

mitigations

Document what you did to reduce risks:

{
  "mitigations": "Three-stage filtering: (1) Keyword blocklist for explicit content, (2) Perspective API with toxicity threshold 0.8 and severe_toxicity threshold 0.5, (3) Manual review of 1% sample. Estimated 99% of severely toxic content removed. Borderline cases retained with 'potentially_sensitive' flag in metadata."
}

Include:

Tools and thresholds used
Effectiveness estimates
What remains after filtering
Recommendations for users

Risk Assessment Checklist

When assessing safety, consider:

Source risk - Where does the data come from?
- Curated sources: lower risk
- Open web: higher risk
- User-generated content: highest risk
Domain risk - What topics are covered?
- News/politics: may contain conflict, hate speech
- Social media: harassment, toxicity
- Forums: variable risk by community
Temporal risk - When was it collected?
- During conflicts/elections: higher political toxicity
- Recent events: may contain misinformation
Language risk - Some languages have less tool support
- Toxicity classifiers may underperform
- Manual review may be necessary

Safety for Low-Resource Languages

{
  "safety": {
    "content_risk_level": "medium",
    "known_risky_categories": ["hate", "other"],
    "mitigations": "English content filtered with Perspective API. Hausa content reviewed manually by native speakers (5% sample). Toxicity classifier not available for Hausa - relying on keyword lists and manual review. Higher residual risk in Hausa subset."
  }
}

Downstream Recommendations

Include guidance for users in your documentation:

## Safety Recommendations

This dataset has been filtered but may contain:

- Implicit bias in news reporting
- Occasional offensive language in quotes
- References to violence in conflict reporting

For production use, we recommend:

1. Apply additional toxicity filtering
2. Use content moderation on model outputs
3. Test for demographic biases before deployment

Safety Section

Fields

Enum Values

content_risk_level

known_risky_categories

Example

Field Details

content_risk_level

known_risky_categories

mitigations

Risk Assessment Checklist

Safety for Low-Resource Languages

Downstream Recommendations

See Also