Community Section
The community section documents local and community involvement in the dataset, especially important for low-resource languages. This information helps users understand the dataset’s cultural grounding and community relationships.
Fields
All fields in the community section are optional.
| Field | Type | Description |
|---|---|---|
local_stewards | string | Local organizations or communities that steward the dataset |
has_community_review | boolean | Whether community representatives reviewed the dataset |
community_review_notes | string | How communities were consulted or involved |
benefit_sharing | string | How benefits are shared with contributors or communities |
Example
{
"community": {
"local_stewards": "Hausa Language Development Association (HALDA), Bayero University Kano",
"has_community_review": true,
"community_review_notes": "Dataset reviewed by HALDA members for cultural appropriateness and language quality. Native speaker committee verified orthography and dialect representation. Community feedback incorporated over 3 revision cycles.",
"benefit_sharing": "Contributors receive acknowledgment in publications. 20% of any licensing revenue directed to HALDA language preservation programs. Training workshops provided to local researchers."
}
}
Field Details
local_stewards
Identify organizations or communities with oversight:
{
"local_stewards": "Masakhane NLP Community, African Language Technology Initiative"
}
Stewards may include:
- Language preservation organizations
- University departments
- Community groups
- Cultural institutions
- Indigenous organizations
has_community_review
Indicate whether affected communities were involved:
{
"has_community_review": true
}
Community review can include:
- Quality assessment by native speakers
- Cultural sensitivity review
- Verification of dialect representation
- Feedback on data card accuracy
community_review_notes
Document the review process:
{
"community_review_notes": "Three-phase community review: (1) Initial review by 5 native speaker linguists, (2) Public comment period in community forums, (3) Final approval by language committee. Key feedback: improved diacritic handling, added regional dialect tags, removed culturally sensitive religious content."
}
Include:
- Who reviewed the dataset
- Review methodology
- Feedback received
- Changes made
benefit_sharing
Describe how benefits flow to communities:
{
"benefit_sharing": "Open access ensures community members can use their own language data. Contributors credited by name (with consent). Training materials developed with community input shared freely. Annual workshops for local ML practitioners."
}
Community Involvement Models
Advisory Model
Community provides guidance but doesn’t control data:
{
"community_review_notes": "Advisory board of native speakers consulted on collection priorities and quality standards."
}
Partnership Model
Community co-creates and co-owns the dataset:
{
"local_stewards": "Joint stewardship between University Research Lab and Community Language Council",
"community_review_notes": "Dataset co-developed with Community Language Council. All decisions require community approval."
}
Community-Led Model
Community controls the dataset:
{
"local_stewards": "Indigenous Language Authority (sole steward)",
"community_review_notes": "Dataset created by and for the community. External use requires community consent."
}
Low-Resource Language Considerations
For datasets involving low-resource or endangered languages:
- Consult early - Involve communities before collection
- Respect sovereignty - Communities may have data governance requirements
- Consider sensitivity - Some content may be culturally restricted
- Plan for maintenance - Ensure long-term community access
- Share benefits - Make data useful to the community
{
"community": {
"local_stewards": "Traditional Language Keepers Council",
"has_community_review": true,
"community_review_notes": "Council approved release of non-ceremonial content. Sacred texts and ritual language excluded per community request. Orthography follows council standards.",
"benefit_sharing": "Dataset freely available to community members. Language learning app developed using this data shared with community schools."
}
}
See Also
- Provenance Section - Data collection details
- Rights Section - Consent and legal basis
- Governance Section - Review status