Community Section

The community section documents local and community involvement in the dataset, especially important for low-resource languages. This information helps users understand the dataset’s cultural grounding and community relationships.

Fields

All fields in the community section are optional.

FieldTypeDescription
local_stewardsstringLocal organizations or communities that steward the dataset
has_community_reviewbooleanWhether community representatives reviewed the dataset
community_review_notesstringHow communities were consulted or involved
benefit_sharingstringHow benefits are shared with contributors or communities

Example

{
  "community": {
    "local_stewards": "Hausa Language Development Association (HALDA), Bayero University Kano",
    "has_community_review": true,
    "community_review_notes": "Dataset reviewed by HALDA members for cultural appropriateness and language quality. Native speaker committee verified orthography and dialect representation. Community feedback incorporated over 3 revision cycles.",
    "benefit_sharing": "Contributors receive acknowledgment in publications. 20% of any licensing revenue directed to HALDA language preservation programs. Training workshops provided to local researchers."
  }
}

Field Details

local_stewards

Identify organizations or communities with oversight:

{
  "local_stewards": "Masakhane NLP Community, African Language Technology Initiative"
}

Stewards may include:

  • Language preservation organizations
  • University departments
  • Community groups
  • Cultural institutions
  • Indigenous organizations

has_community_review

Indicate whether affected communities were involved:

{
  "has_community_review": true
}

Community review can include:

  • Quality assessment by native speakers
  • Cultural sensitivity review
  • Verification of dialect representation
  • Feedback on data card accuracy

community_review_notes

Document the review process:

{
  "community_review_notes": "Three-phase community review: (1) Initial review by 5 native speaker linguists, (2) Public comment period in community forums, (3) Final approval by language committee. Key feedback: improved diacritic handling, added regional dialect tags, removed culturally sensitive religious content."
}

Include:

  • Who reviewed the dataset
  • Review methodology
  • Feedback received
  • Changes made

benefit_sharing

Describe how benefits flow to communities:

{
  "benefit_sharing": "Open access ensures community members can use their own language data. Contributors credited by name (with consent). Training materials developed with community input shared freely. Annual workshops for local ML practitioners."
}

Community Involvement Models

Advisory Model

Community provides guidance but doesn’t control data:

{
  "community_review_notes": "Advisory board of native speakers consulted on collection priorities and quality standards."
}

Partnership Model

Community co-creates and co-owns the dataset:

{
  "local_stewards": "Joint stewardship between University Research Lab and Community Language Council",
  "community_review_notes": "Dataset co-developed with Community Language Council. All decisions require community approval."
}

Community-Led Model

Community controls the dataset:

{
  "local_stewards": "Indigenous Language Authority (sole steward)",
  "community_review_notes": "Dataset created by and for the community. External use requires community consent."
}

Low-Resource Language Considerations

For datasets involving low-resource or endangered languages:

  1. Consult early - Involve communities before collection
  2. Respect sovereignty - Communities may have data governance requirements
  3. Consider sensitivity - Some content may be culturally restricted
  4. Plan for maintenance - Ensure long-term community access
  5. Share benefits - Make data useful to the community
{
  "community": {
    "local_stewards": "Traditional Language Keepers Council",
    "has_community_review": true,
    "community_review_notes": "Council approved release of non-ceremonial content. Sacred texts and ritual language excluded per community request. Orthography follows council standards.",
    "benefit_sharing": "Dataset freely available to community members. Language learning app developed using this data shared with community schools."
  }
}

See Also