Safety Section
The safety section provides an overview of content risks present in the dataset and any mitigations applied. This helps users assess whether the dataset is appropriate for their use case and what safeguards they may need.
Fields
All fields in the safety section are optional.
| Field | Type | Description |
|---|---|---|
content_risk_level | enum | Overall characterization of harmful content presence |
known_risky_categories | array | Categories of potentially harmful content |
mitigations | string | Description of filtering or mitigation steps |
Enum Values
content_risk_level
| Value | Description |
|---|---|
low | Minimal harmful content, suitable for general use |
medium | Some harmful content present, may need downstream filtering |
high | Significant harmful content, requires careful handling |
known_risky_categories
| Value | Description |
|---|---|
hate | Hate speech or discriminatory content |
harassment | Harassment or bullying content |
sexual | Sexual or adult content |
violence | Violent or graphic content |
self-harm | Self-harm or suicide-related content |
illicit-drugs | Drug-related content |
extremism | Extremist or radicalization content |
other | Other risky content categories |
Example
{
"safety": {
"content_risk_level": "medium",
"known_risky_categories": ["hate", "violence"],
"mitigations": "Applied perspective API toxicity filter with threshold 0.7. Removed examples flagged as severely toxic. Some borderline content remains for research purposes. Recommend additional filtering for production use."
}
}
Field Details
content_risk_level
Choose the level that best characterizes your dataset:
| Level | Typical Sources | Recommended Use |
|---|---|---|
low | Curated content, educational materials, professional text | General purpose, production |
medium | Web scrapes, social media, news | Research, with downstream filtering |
high | Unfiltered web, forums, comments | Research only, strong safeguards required |
known_risky_categories
List all categories present, even if filtered:
{
"known_risky_categories": ["hate", "harassment", "sexual", "violence"]
}
This helps users:
- Understand what types of content may remain
- Plan appropriate downstream filtering
- Assess suitability for their application
mitigations
Document what you did to reduce risks:
{
"mitigations": "Three-stage filtering: (1) Keyword blocklist for explicit content, (2) Perspective API with toxicity threshold 0.8 and severe_toxicity threshold 0.5, (3) Manual review of 1% sample. Estimated 99% of severely toxic content removed. Borderline cases retained with 'potentially_sensitive' flag in metadata."
}
Include:
- Tools and thresholds used
- Effectiveness estimates
- What remains after filtering
- Recommendations for users
Risk Assessment Checklist
When assessing safety, consider:
-
Source risk - Where does the data come from?
- Curated sources: lower risk
- Open web: higher risk
- User-generated content: highest risk
-
Domain risk - What topics are covered?
- News/politics: may contain conflict, hate speech
- Social media: harassment, toxicity
- Forums: variable risk by community
-
Temporal risk - When was it collected?
- During conflicts/elections: higher political toxicity
- Recent events: may contain misinformation
-
Language risk - Some languages have less tool support
- Toxicity classifiers may underperform
- Manual review may be necessary
Safety for Low-Resource Languages
{
"safety": {
"content_risk_level": "medium",
"known_risky_categories": ["hate", "other"],
"mitigations": "English content filtered with Perspective API. Hausa content reviewed manually by native speakers (5% sample). Toxicity classifier not available for Hausa - relying on keyword lists and manual review. Higher residual risk in Hausa subset."
}
}
Downstream Recommendations
Include guidance for users in your documentation:
## Safety Recommendations
This dataset has been filtered but may contain:
- Implicit bias in news reporting
- Occasional offensive language in quotes
- References to violence in conflict reporting
For production use, we recommend:
1. Apply additional toxicity filtering
2. Use content moderation on model outputs
3. Test for demographic biases before deployment
See Also
- Rights Section - Personal data and consent
- Processing Section - Filtering steps applied
- Quality Section - Known issues