|
--- |
|
license: agpl-3.0 |
|
datasets: |
|
- stvlynn/Cantonese-Dialogue |
|
language: |
|
- zh |
|
pipeline_tag: text-generation |
|
tags: |
|
- Cantonese |
|
- 廣東話 |
|
- 粤语 |
|
--- |
|
|
|
# Qwen-7B-Chat-Cantonese (通议千问·粤语) |
|
## Intro |
|
Qwen-7B-Chat-Cantonese is a fine-tuned version based on Qwen-7B-Chat, trained on a substantial amount of Cantonese language data. |
|
|
|
Qwen-7B-Chat-Cantonese係基於Qwen-7B-Chat嘅微調版本,基於大量粵語數據進行訓練。 |
|
|
|
[ModelScope(魔搭社区)](https://www.modelscope.cn/models/stvlynn/Qwen-7B-Chat-Cantonese) |
|
|
|
## Usage |
|
|
|
### Requirements |
|
|
|
* python 3.8 and above |
|
* pytorch 1.12 and above, 2.0 and above are recommended |
|
* CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.) |
|
|
|
### Dependency |
|
|
|
To run Qwen-7B-Chat-Cantonese, please make sure you meet the above requirements, and then execute the following pip commands to install the dependent libraries. |
|
|
|
```bash |
|
pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed |
|
``` |
|
|
|
In addition, it is recommended to install the `flash-attention` library (**we support flash attention 2 now.**) for higher efficiency and lower memory usage. |
|
|
|
```bash |
|
git clone https://github.com/Dao-AILab/flash-attention |
|
cd flash-attention && pip install . |
|
``` |
|
|
|
### Quickstart |
|
|
|
Pls turn to QwenLM/Qwen - [Quickstart](https://github.com/QwenLM/Qwen?tab=readme-ov-file#quickstart) |
|
|
|
## Training Parameters |
|
|
|
| Parameter | Description | Value | |
|
|-----------------|----------------------------------------|--------| |
|
| Learning Rate | AdamW optimizer learning rate | 7e-5 | |
|
| Weight Decay | Regularization strength | 0.8 | |
|
| Gamma | Learning rate decay factor | 1.0 | |
|
| Batch Size | Number of samples per batch | 1000 | |
|
| Precision | Floating point precision | fp16 | |
|
| Learning Policy | Learning rate adjustment policy | cosine | |
|
| Warmup Steps | Initial steps without learning rate adjustment | 0 | |
|
| Total Steps | Total training steps | 1024 | |
|
| Gradient Accumulation Steps | Number of steps to accumulate gradients before updating | 8 | |
|
|
|
![loss](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/image.q9v1ak08ljk.webp) |
|
|
|
## Demo |
|
![深水埗有哪些美食](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/截屏2024-05-04-11.59.27.2bea6k113e68.webp) |
|
|
|
![鲁迅为什么打周树人](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/截屏2024-05-04-11.56.46.72tt5czl2gw0.webp) |
|
|
|
![树上几只鸟](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/截屏2024-05-04-12.00.38.267hvmc3z3c0.webp) |
|
|
|
## Special Note |
|
|
|
This is my first fine-tuning LLM project. Pls forgive me if there's anything wrong. |
|
|
|
If you have any questions or suggestions, feel free to contact me. |
|
|
|
[Twitter @stv_lynn](https://x.com/stv_lynn) |
|
|
|
[Telegram @stvlynn](https://t.me/stvlynn) |
|
|
|
[email [email protected]](mailto://[email protected]) |