haoranxu commited on
Commit
dc8d2fb
1 Parent(s): fed06c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -1,8 +1,18 @@
1
  ---
2
  license: mit
3
  ---
4
- **ALMA** (**A**dvanced **L**anguage **M**odel-based tr**A**nslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures superior translation accuracy and performance.
5
-
 
 
 
 
 
 
 
 
 
 
6
  We release four translation models presented in the paper:
7
  - **ALMA-7B**: Full-weight Fine-tune LLaMA-2-7B on 20B monolingual tokens and then **Full-weight** fine-tune on human-written parallel data
8
  - **ALMA-7B-LoRA**: Full-weight Fine-tune LLaMA-2-7B on 20B monolingual tokens and then **LoRA** fine-tune on human-written parallel data
 
1
  ---
2
  license: mit
3
  ---
4
+ **ALMA** (**A**dvanced **L**anguage **M**odel-based tr**A**nslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance.
5
+ Please find more details in our [paper](https://arxiv.org/abs/2309.11674).
6
+ ```
7
+ @misc{xu2023paradigm,
8
+ title={A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models},
9
+ author={Haoran Xu and Young Jin Kim and Amr Sharaf and Hany Hassan Awadalla},
10
+ year={2023},
11
+ eprint={2309.11674},
12
+ archivePrefix={arXiv},
13
+ primaryClass={cs.CL}
14
+ }
15
+ ```
16
  We release four translation models presented in the paper:
17
  - **ALMA-7B**: Full-weight Fine-tune LLaMA-2-7B on 20B monolingual tokens and then **Full-weight** fine-tune on human-written parallel data
18
  - **ALMA-7B-LoRA**: Full-weight Fine-tune LLaMA-2-7B on 20B monolingual tokens and then **LoRA** fine-tune on human-written parallel data