File size: 3,094 Bytes

986054b
 
 
3a0e99d
041c900
 
 
 
 
 
 
 
9f028ec
a6e0056
 
9f028ec
 
 
986054b
3a0e99d
 
 
 
 
 
 
 
 
 
171cd58
3a0e99d
171cd58
 
3a0e99d
2789e1d
3a0e99d
 
 
 
171cd58
 
 
3a0e99d
171cd58
3a0e99d
 
 
 
 
 
 
 
 
 
 
171cd58
0d2c0ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3a0e99d
0d2c0ea
 
 
 
 
 
 
 
3a0e99d
0d2c0ea
 
 
3a0e99d
0d2c0ea
 
 
 
3a0e99d
0d2c0ea
171cd58
0d2c0ea
 
 
 
9f028ec

---
library_name: peft
base_model: beomi/open-llama-2-ko-7b
license: cc-by-sa-4.0
datasets:
- traintogpb/aihub-flores-koen-integrated-sparta-30k
language:
- en
- ko
metrics:
- sacrebleu
- comet
pipeline_tag: translation
tags:
- translation
- text-generation
- ko2en
- en2ko
---
### Pretrained LM
- [beomi/open-llama-2-ko-7b](https://huggingface.co/beomi/open-llama-2-ko-7b) (MIT License)

### Training Dataset
- [traintogpb/aihub-flores-koen-integrated-sparta-30k](https://huggingface.co/datasets/traintogpb/aihub-flores-koen-integrated-sparta-30k)
- Can translate in Enlgish-Korean (bi-directional)

### Prompt
- Template:
  ```python
    prompt = f"Translate this from {src_lang} to {tgt_lang}\n### {src_lang}: {src_text}\n### {tgt_lang}:"

    >>> # src_lang can be 'English', '한국어'
    >>> # tgt_lang can be '한국어', 'English'
  ```

- Issue:
  The tokenizer of the model tokenizes the prompt below in different way with the prompt above.
  Make sure to use the prompt proposed above.
  ```python
    prompt = f"""Translate this from {src_lang} to {tgt_lang}
    ### {src_lang}: {src_text}
    ### {tgt_lang}:"""

    >>> # DO NOT USE this prompt
  ```
  And mind that there is no "space (`_`)" at the end of the prompt.

### Training
- Trained with QLoRA
  - PLM: NormalFloat 4-bit
  - Adapter: BrainFloat 16-bit
  - Adapted to all the linear layers (around 2.2%)

### Usage (IMPORTANT)
- Should remove the EOS token (`<|endoftext|>`, id=46332) at the end of the prompt.
  ```python
    # MODEL
    plm_name = 'beomi/open-llama-2-ko-7b'
    adapter_name = 'traintogpb/llama-2-enko-translator-7b-qlora-adapter'
    model = LlamaForCausalLM.from_pretrained(
        plm_name, 
        max_length=768,
        quantization_config=bnb_config, # Use the QLoRA config above
        attn_implementation='flash_attention_2',
        torch_dtype=torch.bfloat16
    )
    model = PeftModel.from_pretrained(
        model, 
        adapter_name, 
        torch_dtype=torch.bfloat16
    )

    # TOKENIZER
    tokenizer = LlamaTokenizer.from_pretrained(plm_name)
    tokenizer.pad_token = "</s>"
    tokenizer.pad_token_id = 2
    tokenizer.eos_token = "<|endoftext|>" # Must be differentiated from the PAD token
    tokenizer.eos_token_id = 46332
    tokenizer.add_eos_token = True
    tokenizer.model_max_length = 768

    # INFERENCE
    text = "NMIXX is the world-best female idol group, who came back with the new song 'DASH'." 
    prompt = f"Translate this from {src_lang} to {tgt_lang}\n### {src_lang}: {src_text}\n### {tgt_lang}:"

    inputs = tokenizer(prompt, return_tensors="pt", max_length=max_length, truncation=True)
    # REMOVE EOS TOKEN IN THE PROMPT
    inputs['input_ids'] = inputs['input_ids'][0][:-1].unsqueeze(dim=0)
    inputs['attention_mask'] = inputs['attention_mask'][0][:-1].unsqueeze(dim=0)

    outputs = model.generate(**inputs, max_length=max_length, eos_token_id=46332)

    input_len = len(inputs['input_ids'].squeeze())
            
    translated_text = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=True)
    print(translated_text)
  ```