|
--- |
|
library_name: transformers |
|
license: llama3 |
|
datasets: |
|
- 2A2I/argilla-dpo-mix-7k-arabic |
|
language: |
|
- ar |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# π³ Arabic ORPO LLAMA 3 |
|
<center> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6116d0584ef9fdfbf45dc4d9/3ns3O_bWYxKEXmozA073h.png"> |
|
</center> |
|
|
|
|
|
## π Story first |
|
|
|
This model is the a finetuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using [ORPO](https://github.com/xfactlab/orpo) on [2A2I/argilla-dpo-mix-7k-arabic](https://huggingface.co/datasets/2A2I/argilla-dpo-mix-7k-arabic). |
|
|
|
I wanted to try ORPO and see if it will better align a biased English model like **llama3** to the arabic language or it will faill. |
|
|
|
While the evaluations favour the base llama3 over my finetune, in practice i found my finetune was much better at spitting coherent (mostly correct) arabic text which i find interesting. |
|
|
|
I would encourage everyone to try out the model from [here](https://huggingface.co/spaces/MohamedRashad/Arabic-Chatbot-Arena) and share his insights with me ^^ |
|
|
|
## π€ Evaluation and Results |
|
|
|
This result was made using [lighteval](https://github.com/huggingface/lighteval) with the __community|arabic_mmlu__ tasks. |
|
|
|
| Community | Llama-3-8B-Instruct | Arabic-ORPO-Llama-3-8B-Instrcut | |
|
|----------------------------------|---------------------|----------------------------------| |
|
| **All** | **0.348** | **0.317** | |
|
| Abstract Algebra | 0.310 | 0.230 | |
|
| Anatomy | 0.385 | 0.348 | |
|
| Astronomy | 0.388 | 0.316 | |
|
| Business Ethics | 0.480 | 0.370 | |
|
| Clinical Knowledge | 0.396 | 0.385 | |
|
| College Biology | 0.347 | 0.299 | |
|
| College Chemistry | 0.180 | 0.250 | |
|
| College Computer Science | 0.250 | 0.190 | |
|
| College Mathematics | 0.260 | 0.280 | |
|
| College Medicine | 0.231 | 0.249 | |
|
| College Physics | 0.225 | 0.216 | |
|
| Computer Security | 0.470 | 0.440 | |
|
| Conceptual Physics | 0.315 | 0.404 | |
|
| Econometrics | 0.263 | 0.272 | |
|
| Electrical Engineering | 0.414 | 0.359 | |
|
| Elementary Mathematics | 0.320 | 0.272 | |
|
| Formal Logic | 0.270 | 0.214 | |
|
| Global Facts | 0.320 | 0.320 | |
|
| High School Biology | 0.332 | 0.335 | |
|
| High School Chemistry | 0.256 | 0.296 | |
|
| High School Computer Science | 0.350 | 0.300 | |
|
| High School European History | 0.224 | 0.242 | |
|
| High School Geography | 0.323 | 0.364 | |
|
| High School Government & Politics| 0.352 | 0.285 | |
|
| High School Macroeconomics | 0.290 | 0.285 | |
|
| High School Mathematics | 0.237 | 0.278 | |
|
| High School Microeconomics | 0.231 | 0.273 | |
|
| High School Physics | 0.252 | 0.225 | |
|
| High School Psychology | 0.316 | 0.330 | |
|
| High School Statistics | 0.199 | 0.176 | |
|
| High School US History | 0.284 | 0.250 | |
|
| High School World History | 0.312 | 0.274 | |
|
| Human Aging | 0.369 | 0.430 | |
|
| Human Sexuality | 0.481 | 0.321 | |
|
| International Law | 0.603 | 0.405 | |
|
| Jurisprudence | 0.491 | 0.370 | |
|
| Logical Fallacies | 0.368 | 0.276 | |
|
| Machine Learning | 0.214 | 0.312 | |
|
| Management | 0.350 | 0.379 | |
|
| Marketing | 0.521 | 0.547 | |
|
| Medical Genetics | 0.320 | 0.330 | |
|
| Miscellaneous | 0.446 | 0.443 | |
|
| Moral Disputes | 0.422 | 0.306 | |
|
| Moral Scenarios | 0.248 | 0.241 | |
|
| Nutrition | 0.412 | 0.346 | |
|
| Philosophy | 0.408 | 0.328 | |
|
| Prehistory | 0.429 | 0.349 | |
|
| Professional Accounting | 0.344 | 0.273 | |
|
| Professional Law | 0.306 | 0.244 | |
|
| Professional Medicine | 0.228 | 0.206 | |
|
| Professional Psychology | 0.337 | 0.315 | |
|
| Public Relations | 0.391 | 0.373 | |
|
| Security Studies | 0.469 | 0.335 | |
|
| Sociology | 0.498 | 0.408 | |
|
| US Foreign Policy | 0.590 | 0.490 | |
|
| Virology | 0.422 | 0.416 | |
|
| World Religions | 0.404 | 0.304 | |
|
| Average (All Communities) | 0.348 | 0.317 | |
|
|
|
|