Multilingual QA Dataset | Dataset Registry
A hybrid question-answering dataset combining real-world questions with synthetic answers, covering 10 languages with emphasis on factual knowledge and reading comprehension.
Back to Registry
Access Dataset
Examples 250,000 Languages 10 License CC-BY-NC-SA-4.0 Availability restricted
Multilingual QA Dataset
hybridmultilingual-qa v1.0.0 | Maintained by Global AI Research Consortium
A hybrid question-answering dataset combining real-world questions with synthetic answers, covering 10 languages with emphasis on factual knowledge and reading comprehension.
Core Information
- ID
- multilingual-qa
- Version
- 1.0.0
- Maintainer
- Global AI Research Consortium
- Contact
- datasets@globalairc.org
Access Information
- Availability
- restricted
- Terms
- View Terms
- Request Instructions
- Submit a data access request form at the dataset URL. Academic researchers typically approved within 48 hours. Commercial entities require a separate license agreement.
Provenance
- Source Types
- crowdsourcedsynthetic-generation
- Geography
- global
- Collection Period
- 2024-01-01 - 2024-10-15
- Notes
- Questions were crowdsourced from native speakers. A subset of answers were generated using GPT-4 and validated by human annotators.
Intended Use
- Intended Uses
-
- Multilingual QA system training
- Reading comprehension research
- Cross-lingual transfer learning
- Out of Scope
-
- Medical or legal advice systems
- Production systems without human oversight