Update README.md

36259c5 verified 5 months ago

No virus

6.99 kB

	---
	library_name: transformers
	license: llama3
	datasets:
	- 2A2I/argilla-dpo-mix-7k-arabic
	language:
	- ar
	pipeline_tag: text-generation
	---

	# 👳 Arabic ORPO LLAMA 3
	<center>
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6116d0584ef9fdfbf45dc4d9/3ns3O_bWYxKEXmozA073h.png">
	</center>


	## 👓 Story first

	This model is the a finetuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using [ORPO](https://github.com/xfactlab/orpo) on [2A2I/argilla-dpo-mix-7k-arabic](https://huggingface.co/datasets/2A2I/argilla-dpo-mix-7k-arabic).

	I wanted to try ORPO and see if it will better align a biased English model like llama3 to the arabic language or it will faill.

	While the evaluations favour the base llama3 over my finetune, in practice i found my finetune was much better at spitting coherent (mostly correct) arabic text which i find interesting.

	I would encourage everyone to try out the model from [here](https://huggingface.co/spaces/MohamedRashad/Arabic-Chatbot-Arena) and share his insights with me ^^

	## 🤔 Evaluation and Results

	This result was made using [lighteval](https://github.com/huggingface/lighteval) with the __community\|arabic_mmlu__ tasks.

	\| Community \| Llama-3-8B-Instruct \| Arabic-ORPO-Llama-3-8B-Instrcut \|
	\|----------------------------------\|---------------------\|----------------------------------\|
	\| All \| 0.348 \| 0.317 \|
	\| Abstract Algebra \| 0.310 \| 0.230 \|
	\| Anatomy \| 0.385 \| 0.348 \|
	\| Astronomy \| 0.388 \| 0.316 \|
	\| Business Ethics \| 0.480 \| 0.370 \|
	\| Clinical Knowledge \| 0.396 \| 0.385 \|
	\| College Biology \| 0.347 \| 0.299 \|
	\| College Chemistry \| 0.180 \| 0.250 \|
	\| College Computer Science \| 0.250 \| 0.190 \|
	\| College Mathematics \| 0.260 \| 0.280 \|
	\| College Medicine \| 0.231 \| 0.249 \|
	\| College Physics \| 0.225 \| 0.216 \|
	\| Computer Security \| 0.470 \| 0.440 \|
	\| Conceptual Physics \| 0.315 \| 0.404 \|
	\| Econometrics \| 0.263 \| 0.272 \|
	\| Electrical Engineering \| 0.414 \| 0.359 \|
	\| Elementary Mathematics \| 0.320 \| 0.272 \|
	\| Formal Logic \| 0.270 \| 0.214 \|
	\| Global Facts \| 0.320 \| 0.320 \|
	\| High School Biology \| 0.332 \| 0.335 \|
	\| High School Chemistry \| 0.256 \| 0.296 \|
	\| High School Computer Science \| 0.350 \| 0.300 \|
	\| High School European History \| 0.224 \| 0.242 \|
	\| High School Geography \| 0.323 \| 0.364 \|
	\| High School Government & Politics\| 0.352 \| 0.285 \|
	\| High School Macroeconomics \| 0.290 \| 0.285 \|
	\| High School Mathematics \| 0.237 \| 0.278 \|
	\| High School Microeconomics \| 0.231 \| 0.273 \|
	\| High School Physics \| 0.252 \| 0.225 \|
	\| High School Psychology \| 0.316 \| 0.330 \|
	\| High School Statistics \| 0.199 \| 0.176 \|
	\| High School US History \| 0.284 \| 0.250 \|
	\| High School World History \| 0.312 \| 0.274 \|
	\| Human Aging \| 0.369 \| 0.430 \|
	\| Human Sexuality \| 0.481 \| 0.321 \|
	\| International Law \| 0.603 \| 0.405 \|
	\| Jurisprudence \| 0.491 \| 0.370 \|
	\| Logical Fallacies \| 0.368 \| 0.276 \|
	\| Machine Learning \| 0.214 \| 0.312 \|
	\| Management \| 0.350 \| 0.379 \|
	\| Marketing \| 0.521 \| 0.547 \|
	\| Medical Genetics \| 0.320 \| 0.330 \|
	\| Miscellaneous \| 0.446 \| 0.443 \|
	\| Moral Disputes \| 0.422 \| 0.306 \|
	\| Moral Scenarios \| 0.248 \| 0.241 \|
	\| Nutrition \| 0.412 \| 0.346 \|
	\| Philosophy \| 0.408 \| 0.328 \|
	\| Prehistory \| 0.429 \| 0.349 \|
	\| Professional Accounting \| 0.344 \| 0.273 \|
	\| Professional Law \| 0.306 \| 0.244 \|
	\| Professional Medicine \| 0.228 \| 0.206 \|
	\| Professional Psychology \| 0.337 \| 0.315 \|
	\| Public Relations \| 0.391 \| 0.373 \|
	\| Security Studies \| 0.469 \| 0.335 \|
	\| Sociology \| 0.498 \| 0.408 \|
	\| US Foreign Policy \| 0.590 \| 0.490 \|
	\| Virology \| 0.422 \| 0.416 \|
	\| World Religions \| 0.404 \| 0.304 \|
	\| Average (All Communities) \| 0.348 \| 0.317 \|

	---
	library_name: transformers
	license: llama3
	datasets:
	- 2A2I/argilla-dpo-mix-7k-arabic
	language:
	- ar
	pipeline_tag: text-generation
	---

	# 👳 Arabic ORPO LLAMA 3
	<center>
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6116d0584ef9fdfbf45dc4d9/3ns3O_bWYxKEXmozA073h.png">
	</center>


	## 👓 Story first

	This model is the a finetuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using [ORPO](https://github.com/xfactlab/orpo) on [2A2I/argilla-dpo-mix-7k-arabic](https://huggingface.co/datasets/2A2I/argilla-dpo-mix-7k-arabic).

	I wanted to try ORPO and see if it will better align a biased English model like llama3 to the arabic language or it will faill.

	While the evaluations favour the base llama3 over my finetune, in practice i found my finetune was much better at spitting coherent (mostly correct) arabic text which i find interesting.

	I would encourage everyone to try out the model from [here](https://huggingface.co/spaces/MohamedRashad/Arabic-Chatbot-Arena) and share his insights with me ^^

	## 🤔 Evaluation and Results

	This result was made using [lighteval](https://github.com/huggingface/lighteval) with the __community\|arabic_mmlu__ tasks.

	\| Community \| Llama-3-8B-Instruct \| Arabic-ORPO-Llama-3-8B-Instrcut \|
	\|----------------------------------\|---------------------\|----------------------------------\|
	\| All \| 0.348 \| 0.317 \|
	\| Abstract Algebra \| 0.310 \| 0.230 \|
	\| Anatomy \| 0.385 \| 0.348 \|
	\| Astronomy \| 0.388 \| 0.316 \|
	\| Business Ethics \| 0.480 \| 0.370 \|
	\| Clinical Knowledge \| 0.396 \| 0.385 \|
	\| College Biology \| 0.347 \| 0.299 \|
	\| College Chemistry \| 0.180 \| 0.250 \|
	\| College Computer Science \| 0.250 \| 0.190 \|
	\| College Mathematics \| 0.260 \| 0.280 \|
	\| College Medicine \| 0.231 \| 0.249 \|
	\| College Physics \| 0.225 \| 0.216 \|
	\| Computer Security \| 0.470 \| 0.440 \|
	\| Conceptual Physics \| 0.315 \| 0.404 \|
	\| Econometrics \| 0.263 \| 0.272 \|
	\| Electrical Engineering \| 0.414 \| 0.359 \|
	\| Elementary Mathematics \| 0.320 \| 0.272 \|
	\| Formal Logic \| 0.270 \| 0.214 \|
	\| Global Facts \| 0.320 \| 0.320 \|
	\| High School Biology \| 0.332 \| 0.335 \|
	\| High School Chemistry \| 0.256 \| 0.296 \|
	\| High School Computer Science \| 0.350 \| 0.300 \|
	\| High School European History \| 0.224 \| 0.242 \|
	\| High School Geography \| 0.323 \| 0.364 \|
	\| High School Government & Politics\| 0.352 \| 0.285 \|
	\| High School Macroeconomics \| 0.290 \| 0.285 \|
	\| High School Mathematics \| 0.237 \| 0.278 \|
	\| High School Microeconomics \| 0.231 \| 0.273 \|
	\| High School Physics \| 0.252 \| 0.225 \|
	\| High School Psychology \| 0.316 \| 0.330 \|
	\| High School Statistics \| 0.199 \| 0.176 \|
	\| High School US History \| 0.284 \| 0.250 \|
	\| High School World History \| 0.312 \| 0.274 \|
	\| Human Aging \| 0.369 \| 0.430 \|
	\| Human Sexuality \| 0.481 \| 0.321 \|
	\| International Law \| 0.603 \| 0.405 \|
	\| Jurisprudence \| 0.491 \| 0.370 \|
	\| Logical Fallacies \| 0.368 \| 0.276 \|
	\| Machine Learning \| 0.214 \| 0.312 \|
	\| Management \| 0.350 \| 0.379 \|
	\| Marketing \| 0.521 \| 0.547 \|
	\| Medical Genetics \| 0.320 \| 0.330 \|
	\| Miscellaneous \| 0.446 \| 0.443 \|
	\| Moral Disputes \| 0.422 \| 0.306 \|
	\| Moral Scenarios \| 0.248 \| 0.241 \|
	\| Nutrition \| 0.412 \| 0.346 \|
	\| Philosophy \| 0.408 \| 0.328 \|
	\| Prehistory \| 0.429 \| 0.349 \|
	\| Professional Accounting \| 0.344 \| 0.273 \|
	\| Professional Law \| 0.306 \| 0.244 \|
	\| Professional Medicine \| 0.228 \| 0.206 \|
	\| Professional Psychology \| 0.337 \| 0.315 \|
	\| Public Relations \| 0.391 \| 0.373 \|
	\| Security Studies \| 0.469 \| 0.335 \|
	\| Sociology \| 0.498 \| 0.408 \|
	\| US Foreign Policy \| 0.590 \| 0.490 \|
	\| Virology \| 0.422 \| 0.416 \|
	\| World Religions \| 0.404 \| 0.304 \|
	\| Average (All Communities) \| 0.348 \| 0.317 \|