|
---
|
|
language:
|
|
- en
|
|
- zh
|
|
library_name: transformers
|
|
tags:
|
|
- Long Context
|
|
- qwen2.5
|
|
- qwen2
|
|
|
|
---
|
|
|
|
# MS-LongWriter-Qwen2.5-7B-Instruct
|
|
|
|
<p align="center">
|
|
🤖 <a href="https://modelscope.cn/datasets/swift/longwriter-6k-filtered" target="_blank">[LongWriter Dataset] </a> • 💻 <a href="https://github.com/THUDM/LongWriter" target="_blank">[Github Repo]</a> • 📃 <a href="https://arxiv.org/abs/2408.07055" target="_blank">[LongWriter Paper]</a> • 📃 <a href="https://arxiv.org/pdf/2410.10210" target="_blank">[Tech Report]</a>
|
|
</p>
|
|
|
|
MS-LongWriter-Qwen2.5-7B-Instruct is trained based on [https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct), and is capable of generating 10,000+ words at once.
|
|
|
|
MS-LongWriter-Qwen2.5-7B-Instruct begins training directly from the Qwen2.5-7B-Instruct, while performing significant distillation on the [LongWriter-6k](https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k) to obtain 666 high-quality samples, which is [LongWriter-6k-filtered](https://modelscope.cn/datasets/swift/longwriter-6k-filtered)
|
|
|
|
|
|
## Datasets
|
|
1. [LongWriter-6k-filtered](https://modelscope.cn/datasets/swift/longwriter-6k-filtered), based on the [LongWriter-6k](https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k)
|
|
2. [Magpie-Qwen2-Pro-200K-Chinese](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese) , random sampling 6k examples.
|
|
3. [Magpie-Qwen2-Pro-200K-English](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-English) , random sampling 6k examples.
|
|
|
|
|
|
## Model
|
|
|
|
We use [ms-swift](https://github.com/modelscope/swift) to fine-tune the Qwen2-7B-Instruct model.
|
|
|
|
1. Installation
|
|
```python
|
|
pip install ms-swift[llm]
|
|
```
|
|
|
|
2. Fine-tuning
|
|
|
|
Envs:
|
|
```text
|
|
Nvidia A100(80G) x 4
|
|
```
|
|
|
|
Run:
|
|
```shell
|
|
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
|
|
--model_type qwen2_5-7b-instruct \
|
|
--dataset longwriter-6k-filtered#666 qwen2-pro-zh#6660 qwen2-pro-en#6660 \
|
|
--max_length 28672 \
|
|
--num_train_epochs 2 \
|
|
--eval_steps 200 \
|
|
--batch_size 1 \
|
|
--gradient_accumulation_steps 64 \
|
|
--gradient_checkpointing true \
|
|
--warmup_ratio 0.1 \
|
|
--learning_rate 1e-5 \
|
|
--sft_type full \
|
|
--loss_name long-ce \
|
|
--check_dataset_strategy warning \
|
|
--save_only_model false \
|
|
--save_total_limit -1 \
|
|
--lazy_tokenize true \
|
|
--dataloader_num_workers 1 \
|
|
--resume_only_model true \
|
|
--neftune_noise_alpha 5 \
|
|
--use_flash_attn true
|
|
```
|
|
|
|
3. Fine-tuning with annealing
|
|
|
|
The annealing strategy is used to improve the performance of the model during the post-training process.
|
|
We leverage the LongWriter-6k-filtered dataset to fine-tune the model with annealing, and set the learning rate to 2e-6.
|
|
Run:
|
|
```shell
|
|
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
|
|
--model_type qwen2_5-7b-instruct \
|
|
--dataset longwriter-6k-filtered#666 \
|
|
--max_length 28672 \
|
|
--num_train_epochs 2 \
|
|
--eval_steps 200 \
|
|
--batch_size 1 \
|
|
--gradient_accumulation_steps 64 \
|
|
--gradient_checkpointing true \
|
|
--warmup_ratio 0.1 \
|
|
--learning_rate 2e-6 \
|
|
--sft_type full \
|
|
--loss_name long-ce \
|
|
--check_dataset_strategy warning \
|
|
--save_only_model false \
|
|
--save_total_limit -1 \
|
|
--lazy_tokenize true \
|
|
--dataloader_num_workers 1 \
|
|
--resume_only_model true \
|
|
--neftune_noise_alpha 5 \
|
|
--use_flash_attn true \
|
|
--resume_from_checkpoint {previous-checkpoint-path}
|
|
|
|
```
|
|
|
|
Note:
|
|
1. The `--resume_from_checkpoint` parameter is used to specify the path of the previous checkpoint. (see the step2)
|
|
|
|
|
|
## Evaluation
|
|
|
|
Refer to [LongWriter Evaluation](https://github.com/modelscope/evalscope/tree/main/evalscope/third_party/longbench_write) from the [EvalScope](https://github.com/modelscope/evalscope).
|
|
|
|
|
|
## Reference
|
|
|
|
If you find our work helpful, please consider citing our paper, and star our github repositories.
|
|
|
|
```bib
|
|
@misc{chen2024minimumtuningunlocklong,
|
|
title={Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key},
|
|
author={Yingda Chen and Xingjun Wang and Jintao Huang and Yunlin Mao and Daoze Zhang and Yuze Zhao},
|
|
year={2024},
|
|
eprint={2410.10210},
|
|
archivePrefix={arXiv},
|
|
primaryClass={cs.CL},
|
|
url={https://arxiv.org/abs/2410.10210},
|
|
}
|
|
```
|
|
|
|
1. 量子位文章:[666条数据教会AI写万字长文!模型数据集都开源](https://mp.weixin.qq.com/s/LvWUSgIRO5HI5YSDRz7SxA)
|
|
2. Tech report: [Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key](https://arxiv.org/pdf/2410.10210)
|
|
|
|
|