C10X's picture
Upload 16 files
bb2619b verified
---
language:
- en
- zh
library_name: transformers
tags:
- Long Context
- qwen2.5
- qwen2
---
# MS-LongWriter-Qwen2.5-7B-Instruct
<p align="center">
🤖 <a href="https://modelscope.cn/datasets/swift/longwriter-6k-filtered" target="_blank">[LongWriter Dataset] </a> • 💻 <a href="https://github.com/THUDM/LongWriter" target="_blank">[Github Repo]</a> • 📃 <a href="https://arxiv.org/abs/2408.07055" target="_blank">[LongWriter Paper]</a> • 📃 <a href="https://arxiv.org/pdf/2410.10210" target="_blank">[Tech Report]</a>
</p>
MS-LongWriter-Qwen2.5-7B-Instruct is trained based on [https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct), and is capable of generating 10,000+ words at once.
MS-LongWriter-Qwen2.5-7B-Instruct begins training directly from the Qwen2.5-7B-Instruct, while performing significant distillation on the [LongWriter-6k](https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k) to obtain 666 high-quality samples, which is [LongWriter-6k-filtered](https://modelscope.cn/datasets/swift/longwriter-6k-filtered)
## Datasets
1. [LongWriter-6k-filtered](https://modelscope.cn/datasets/swift/longwriter-6k-filtered), based on the [LongWriter-6k](https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k)
2. [Magpie-Qwen2-Pro-200K-Chinese](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese) , random sampling 6k examples.
3. [Magpie-Qwen2-Pro-200K-English](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-English) , random sampling 6k examples.
## Model
We use [ms-swift](https://github.com/modelscope/swift) to fine-tune the Qwen2-7B-Instruct model.
1. Installation
```python
pip install ms-swift[llm]
```
2. Fine-tuning
Envs:
```text
Nvidia A100(80G) x 4
```
Run:
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type qwen2_5-7b-instruct \
--dataset longwriter-6k-filtered#666 qwen2-pro-zh#6660 qwen2-pro-en#6660 \
--max_length 28672 \
--num_train_epochs 2 \
--eval_steps 200 \
--batch_size 1 \
--gradient_accumulation_steps 64 \
--gradient_checkpointing true \
--warmup_ratio 0.1 \
--learning_rate 1e-5 \
--sft_type full \
--loss_name long-ce \
--check_dataset_strategy warning \
--save_only_model false \
--save_total_limit -1 \
--lazy_tokenize true \
--dataloader_num_workers 1 \
--resume_only_model true \
--neftune_noise_alpha 5 \
--use_flash_attn true
```
3. Fine-tuning with annealing
The annealing strategy is used to improve the performance of the model during the post-training process.
We leverage the LongWriter-6k-filtered dataset to fine-tune the model with annealing, and set the learning rate to 2e-6.
Run:
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type qwen2_5-7b-instruct \
--dataset longwriter-6k-filtered#666 \
--max_length 28672 \
--num_train_epochs 2 \
--eval_steps 200 \
--batch_size 1 \
--gradient_accumulation_steps 64 \
--gradient_checkpointing true \
--warmup_ratio 0.1 \
--learning_rate 2e-6 \
--sft_type full \
--loss_name long-ce \
--check_dataset_strategy warning \
--save_only_model false \
--save_total_limit -1 \
--lazy_tokenize true \
--dataloader_num_workers 1 \
--resume_only_model true \
--neftune_noise_alpha 5 \
--use_flash_attn true \
--resume_from_checkpoint {previous-checkpoint-path}
```
Note:
1. The `--resume_from_checkpoint` parameter is used to specify the path of the previous checkpoint. (see the step2)
## Evaluation
Refer to [LongWriter Evaluation](https://github.com/modelscope/evalscope/tree/main/evalscope/third_party/longbench_write) from the [EvalScope](https://github.com/modelscope/evalscope).
## Reference
If you find our work helpful, please consider citing our paper, and star our github repositories.
```bib
@misc{chen2024minimumtuningunlocklong,
title={Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key},
author={Yingda Chen and Xingjun Wang and Jintao Huang and Yunlin Mao and Daoze Zhang and Yuze Zhao},
year={2024},
eprint={2410.10210},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.10210},
}
```
1. 量子位文章:[666条数据教会AI写万字长文!模型数据集都开源](https://mp.weixin.qq.com/s/LvWUSgIRO5HI5YSDRz7SxA)
2. Tech report: [Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key](https://arxiv.org/pdf/2410.10210)