stvlynn's picture
Update README.md
31fcb57 verified
|
raw
history blame
3.03 kB
metadata
license: agpl-3.0
datasets:
  - stvlynn/Cantonese-Dialogue
language:
  - zh
pipeline_tag: text-generation
tags:
  - Cantonese
  - 廣東話
  - 粤语

Qwen-7B-Chat-Cantonese

Intro

Qwen-7B-Chat-Cantonese is a fine-tuned version based on Qwen-7B-Chat, trained on a substantial amount of Cantonese language data.

Qwen-7B-Chat-Cantonese係基於Qwen-7B-Chat嘅微調版本,基於大量粵語數據進行訓練。

ModelScope(魔搭社区)

Usage

Requirements

  • python 3.8 and above
  • pytorch 1.12 and above, 2.0 and above are recommended
  • CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)

Dependency

To run Qwen-7B-Chat-Cantonese, please make sure you meet the above requirements, and then execute the following pip commands to install the dependent libraries.

pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed

In addition, it is recommended to install the flash-attention library (we support flash attention 2 now.) for higher efficiency and lower memory usage.

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .

Quickstart

Pls turn to QwenLM/Qwen - Quickstart

Training Parameters

Parameter Description Value
Learning Rate AdamW optimizer learning rate 7e-5
Weight Decay Regularization strength 0.8
Gamma Learning rate decay factor 1.0
Batch Size Number of samples per batch 1000
Precision Floating point precision fp16
Learning Policy Learning rate adjustment policy cosine
Warmup Steps Initial steps without learning rate adjustment 0
Total Steps Total training steps 1024
Gradient Accumulation Steps Number of steps to accumulate gradients before updating 8

loss

Demo

深水埗有哪些美食

鲁迅为什么打周树人

树上几只鸟

Special Note

This is my first fine-tuning LLM project. Pls forgive me if there's anything wrong.

If you have any questions or suggestions, feel free to contact me.

Twitter @stv_lynn

Telegram @stvlynn

email [email protected]