Spaces:

uonlp
/

open_multilingual_llm_leaderboard

Running

App Files Files Community

AItool commited on Apr 13

Commit

95249cc

•

1 Parent(s): 5b6a646

Upload content.py

Browse files

Files changed (1) hide show

content.py +7 -32

content.py CHANGED Viewed

@@ -1,57 +1,32 @@
-TITLE = '<h1 align="center" id="space-title">Open Multilingual LLM Evaluation Leaderboard</h1>'
 INTRO_TEXT = f"""
 ## About
 This leaderboard tracks progress and ranks performance of large language models (LLMs) developed for different languages,
 emphasizing on non-English languages to democratize benefits of LLMs to broader society.
-Our current leaderboard provides evaluation data for 29 languages, i.e.,
-Arabic, Armenian, Basque, Bengali, Catalan, Chinese, Croatian, Danish, Dutch,
-French, German, Gujarati, Hindi, Hungarian, Indonesian, Italian, Kannada, Malayalam,
-Marathi, Nepali, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish,
-Tamil, Telugu, Ukrainian, and Vietnamese, that will be expanded along the way.
-Both multilingual and language-specific LLMs are welcome in this leaderboard.
-We currently evaluate models over four benchmarks:
-- <a href="https://arxiv.org/abs/1803.05457" target="_blank">  AI2 Reasoning Challenge </a> (25-shot)
-- <a href="https://arxiv.org/abs/1905.07830" target="_blank">  HellaSwag </a> (0-shot)
-- <a href="https://arxiv.org/abs/2009.03300" target="_blank">  MMLU </a>  (25-shot)
-- <a href="https://arxiv.org/abs/2109.07958" target="_blank">  TruthfulQA </a> (0-shot)
-The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo).
 """
 HOW_TO = f"""
 ## How to list your model performance on this leaderboard:
-Run the evaluation of your model using this repo: <a href="https://github.com/nlp-uoregon/mlmm-evaluation" target="_blank">https://github.com/nlp-uoregon/mlmm-evaluation</a>.
 And then, push the evaluation log and make a pull request.
 """
 CREDIT = f"""
 ## Credit
 To make this website, we use the following resources:
-- Datasets (AI2_ARC, HellaSwag, MMLU, TruthfulQA)
-- Funding and GPU access (Adobe Research)
-- Evaluation code (EleutherAI's lm_evaluation_harness repo)
 - Leaderboard code (Huggingface4's open_llm_leaderboard repo)
 """
 CITATION = f"""
 ## Citation
 ```
 @misc{{lai2023openllmbenchmark,
-    author = {{Viet Lai and Nghia Trung Ngo and Amir Pouran Ben Veyseh and Franck Dernoncourt and Thien Huu Nguyen}},
-    title={{Open Multilingual LLM Evaluation Leaderboard}},
-    year={{2023}}
 }}
 ```
-"""

+TITLE = '<h1 align="center" id="space-title">Open Multilingual Basque LLM Evaluation Leaderboard</h1><img src="basque.JPG">'
 INTRO_TEXT = f"""
 ## About
 This leaderboard tracks progress and ranks performance of large language models (LLMs) developed for different languages,
 emphasizing on non-English languages to democratize benefits of LLMs to broader society.
+Our current leaderboard provides evaluation data for Basque.
 """
 HOW_TO = f"""
 ## How to list your model performance on this leaderboard:
+Run the evaluation of your model using this repo: <a href="https://github.com/webdevserv/mlmm_basque_evaluation" target="_blank">mlmm_basque_evaluation</a>.
 And then, push the evaluation log and make a pull request.
 """
 CREDIT = f"""
 ## Credit
 To make this website, we use the following resources:
 - Leaderboard code (Huggingface4's open_llm_leaderboard repo)
 """
 CITATION = f"""
 ## Citation
 ```
 @misc{{lai2023openllmbenchmark,
+    author = {{Idoia Lertxundi, thanks to Viet Lai and Nghia Trung Ngo and Amir Pouran Ben Veyseh and Franck Dernoncourt and Thien Huu Nguyen}},
+    title={{Open Basque LLM Evaluation Leaderboard}},
+    year={{2024}}
 }}
 ```
+"""