File size: 3,094 Bytes
986054b 3a0e99d 041c900 9f028ec a6e0056 9f028ec 986054b 3a0e99d 171cd58 3a0e99d 171cd58 3a0e99d 2789e1d 3a0e99d 171cd58 3a0e99d 171cd58 3a0e99d 171cd58 0d2c0ea 3a0e99d 0d2c0ea 3a0e99d 0d2c0ea 3a0e99d 0d2c0ea 3a0e99d 0d2c0ea 171cd58 0d2c0ea 9f028ec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
library_name: peft
base_model: beomi/open-llama-2-ko-7b
license: cc-by-sa-4.0
datasets:
- traintogpb/aihub-flores-koen-integrated-sparta-30k
language:
- en
- ko
metrics:
- sacrebleu
- comet
pipeline_tag: translation
tags:
- translation
- text-generation
- ko2en
- en2ko
---
### Pretrained LM
- [beomi/open-llama-2-ko-7b](https://huggingface.co/beomi/open-llama-2-ko-7b) (MIT License)
### Training Dataset
- [traintogpb/aihub-flores-koen-integrated-sparta-30k](https://huggingface.co/datasets/traintogpb/aihub-flores-koen-integrated-sparta-30k)
- Can translate in Enlgish-Korean (bi-directional)
### Prompt
- Template:
```python
prompt = f"Translate this from {src_lang} to {tgt_lang}\n### {src_lang}: {src_text}\n### {tgt_lang}:"
>>> # src_lang can be 'English', '한국어'
>>> # tgt_lang can be '한국어', 'English'
```
- Issue:
The tokenizer of the model tokenizes the prompt below in different way with the prompt above.
Make sure to use the prompt proposed above.
```python
prompt = f"""Translate this from {src_lang} to {tgt_lang}
### {src_lang}: {src_text}
### {tgt_lang}:"""
>>> # DO NOT USE this prompt
```
And mind that there is no "space (`_`)" at the end of the prompt.
### Training
- Trained with QLoRA
- PLM: NormalFloat 4-bit
- Adapter: BrainFloat 16-bit
- Adapted to all the linear layers (around 2.2%)
### Usage (IMPORTANT)
- Should remove the EOS token (`<|endoftext|>`, id=46332) at the end of the prompt.
```python
# MODEL
plm_name = 'beomi/open-llama-2-ko-7b'
adapter_name = 'traintogpb/llama-2-enko-translator-7b-qlora-adapter'
model = LlamaForCausalLM.from_pretrained(
plm_name,
max_length=768,
quantization_config=bnb_config, # Use the QLoRA config above
attn_implementation='flash_attention_2',
torch_dtype=torch.bfloat16
)
model = PeftModel.from_pretrained(
model,
adapter_name,
torch_dtype=torch.bfloat16
)
# TOKENIZER
tokenizer = LlamaTokenizer.from_pretrained(plm_name)
tokenizer.pad_token = "</s>"
tokenizer.pad_token_id = 2
tokenizer.eos_token = "<|endoftext|>" # Must be differentiated from the PAD token
tokenizer.eos_token_id = 46332
tokenizer.add_eos_token = True
tokenizer.model_max_length = 768
# INFERENCE
text = "NMIXX is the world-best female idol group, who came back with the new song 'DASH'."
prompt = f"Translate this from {src_lang} to {tgt_lang}\n### {src_lang}: {src_text}\n### {tgt_lang}:"
inputs = tokenizer(prompt, return_tensors="pt", max_length=max_length, truncation=True)
# REMOVE EOS TOKEN IN THE PROMPT
inputs['input_ids'] = inputs['input_ids'][0][:-1].unsqueeze(dim=0)
inputs['attention_mask'] = inputs['attention_mask'][0][:-1].unsqueeze(dim=0)
outputs = model.generate(**inputs, max_length=max_length, eos_token_id=46332)
input_len = len(inputs['input_ids'].squeeze())
translated_text = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=True)
print(translated_text)
``` |