stvlynn
/

Qwen-7B-Chat-Cantonese

Text Generation

feature-extraction

Model card Files Files and versions Community

Qwen-7B-Chat-Cantonese / README.md

stvlynn's picture

Update README.md

ce980fd verified 6 months ago

|

3.06 kB

	---
	license: agpl-3.0
	datasets:
	- stvlynn/Cantonese-Dialogue
	language:
	- zh
	pipeline_tag: text-generation
	tags:
	- Cantonese
	- 廣東話
	- 粤语
	---

	# Qwen-7B-Chat-Cantonese (通议千问·粤语)
	## Intro
	Qwen-7B-Chat-Cantonese is a fine-tuned version based on Qwen-7B-Chat, trained on a substantial amount of Cantonese language data.

	Qwen-7B-Chat-Cantonese係基於Qwen-7B-Chat嘅微調版本，基於大量粵語數據進行訓練。

	[ModelScope(魔搭社区)](https://www.modelscope.cn/models/stvlynn/Qwen-7B-Chat-Cantonese)

	## Usage

	### Requirements

	* python 3.8 and above
	* pytorch 1.12 and above, 2.0 and above are recommended
	* CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)

	### Dependency

	To run Qwen-7B-Chat-Cantonese, please make sure you meet the above requirements, and then execute the following pip commands to install the dependent libraries.

	```bash
	pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed
	```

	In addition, it is recommended to install the `flash-attention` library (we support flash attention 2 now.) for higher efficiency and lower memory usage.

	```bash
	git clone https://github.com/Dao-AILab/flash-attention
	cd flash-attention && pip install .
	```

	### Quickstart

	Pls turn to QwenLM/Qwen - [Quickstart](https://github.com/QwenLM/Qwen?tab=readme-ov-file#quickstart)

	## Training Parameters

	\| Parameter \| Description \| Value \|
	\|-----------------\|----------------------------------------\|--------\|
	\| Learning Rate \| AdamW optimizer learning rate \| 7e-5 \|
	\| Weight Decay \| Regularization strength \| 0.8 \|
	\| Gamma \| Learning rate decay factor \| 1.0 \|
	\| Batch Size \| Number of samples per batch \| 1000 \|
	\| Precision \| Floating point precision \| fp16 \|
	\| Learning Policy \| Learning rate adjustment policy \| cosine \|
	\| Warmup Steps \| Initial steps without learning rate adjustment \| 0 \|
	\| Total Steps \| Total training steps \| 1024 \|
	\| Gradient Accumulation Steps \| Number of steps to accumulate gradients before updating \| 8 \|

	![loss](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/image.q9v1ak08ljk.webp)

	## Demo
	![深水埗有哪些美食](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/截屏2024-05-04-11.59.27.2bea6k113e68.webp)

	![鲁迅为什么打周树人](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/截屏2024-05-04-11.56.46.72tt5czl2gw0.webp)

	![树上几只鸟](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/截屏2024-05-04-12.00.38.267hvmc3z3c0.webp)

	## Special Note

	This is my first fine-tuning LLM project. Pls forgive me if there's anything wrong.

	If you have any questions or suggestions, feel free to contact me.

	[Twitter @stv_lynn](https://x.com/stv_lynn)

	[Telegram @stvlynn](https://t.me/stvlynn)

	[email [email protected]](mailto://[email protected])