Indic Benchmarks

community

AI & ML interests

None defined yet.

Indic Language Benchmarking for Large Language Models

India is diverse with 22+ languages. This project aims to benchmark the performance of large language models on Indic languages across datasets. Goal is to evaluate a models abilities in understanding, generating, and processing text in these languages.

We currently have 8 languages across 3 datasets, more coming soon

Languages

  • Bengali (bn)
  • Gujarati (gu)
  • Hindi (hi)
  • Kannada (kn)
  • Malayalam (ml)
  • Odiya (or)
  • Tamil (ta)
  • Telugu (te)

Datasets

Code

Eval Harness

We are also trying to build an MMLU dataset with Indian Knowledge. If anyone is interested in contributing, please reach out to Ram, Munish

models

None public yet