---
language:
- en
- zh
library_name: transformers
tags:
- Long Context
- qwen2.5
- qwen2

---

# MS-LongWriter-Qwen2.5-7B-Instruct

<p align="center">
  🤖 <a href="https://modelscope.cn/datasets/swift/longwriter-6k-filtered" target="_blank">[LongWriter Dataset] </a> • 💻 <a href="https://github.com/THUDM/LongWriter" target="_blank">[Github Repo]</a> • 📃 <a href="https://arxiv.org/abs/2408.07055" target="_blank">[LongWriter Paper]</a> • 📃 <a href="https://arxiv.org/pdf/2410.10210" target="_blank">[Tech Report]</a>
</p>

MS-LongWriter-Qwen2.5-7B-Instruct is trained based on [https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct), and is capable of generating 10,000+ words at once.

MS-LongWriter-Qwen2.5-7B-Instruct begins training directly from the Qwen2.5-7B-Instruct, while performing significant distillation on the [LongWriter-6k](https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k) to obtain 666 high-quality samples, which is [LongWriter-6k-filtered](https://modelscope.cn/datasets/swift/longwriter-6k-filtered)


## Datasets
1. [LongWriter-6k-filtered](https://modelscope.cn/datasets/swift/longwriter-6k-filtered), based on the [LongWriter-6k](https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k)
2. [Magpie-Qwen2-Pro-200K-Chinese](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese) , random sampling 6k examples.
3. [Magpie-Qwen2-Pro-200K-English](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-English) , random sampling 6k examples.


## Model

We use [ms-swift](https://github.com/modelscope/swift) to fine-tune the Qwen2-7B-Instruct model.

1. Installation
```python
pip install ms-swift[llm]
```

2. Fine-tuning

Envs:
```text
Nvidia A100(80G) x 4
```

Run:
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
    --model_type qwen2_5-7b-instruct \
    --dataset longwriter-6k-filtered#666 qwen2-pro-zh#6660 qwen2-pro-en#6660 \
    --max_length 28672 \
    --num_train_epochs 2 \
    --eval_steps 200 \
    --batch_size 1 \
    --gradient_accumulation_steps 64 \
    --gradient_checkpointing true \
    --warmup_ratio 0.1 \
    --learning_rate 1e-5 \
    --sft_type full \
    --loss_name long-ce \
    --check_dataset_strategy warning \
    --save_only_model false \
    --save_total_limit -1 \
    --lazy_tokenize true \
    --dataloader_num_workers 1 \
    --resume_only_model true \
    --neftune_noise_alpha 5 \
    --use_flash_attn true
```

3. Fine-tuning with annealing

The annealing strategy is used to improve the performance of the model during the post-training process.
We leverage the LongWriter-6k-filtered dataset to fine-tune the model with annealing, and set the learning rate to 2e-6.
Run:
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
    --model_type qwen2_5-7b-instruct \
    --dataset longwriter-6k-filtered#666 \
    --max_length 28672 \
    --num_train_epochs 2 \
    --eval_steps 200 \
    --batch_size 1 \
    --gradient_accumulation_steps 64 \
    --gradient_checkpointing true \
    --warmup_ratio 0.1 \
    --learning_rate 2e-6 \
    --sft_type full \
    --loss_name long-ce \
    --check_dataset_strategy warning \
    --save_only_model false \
    --save_total_limit -1 \
    --lazy_tokenize true \
    --dataloader_num_workers 1 \
    --resume_only_model true \
    --neftune_noise_alpha 5 \
    --use_flash_attn true \
    --resume_from_checkpoint {previous-checkpoint-path}

```

Note:
1. The `--resume_from_checkpoint` parameter is used to specify the path of the previous checkpoint. (see the step2)


## Evaluation

Refer to [LongWriter Evaluation](https://github.com/modelscope/evalscope/tree/main/evalscope/third_party/longbench_write) from the [EvalScope](https://github.com/modelscope/evalscope).


## Reference

If you find our work helpful, please consider citing our paper, and star our github repositories.

```bib
@misc{chen2024minimumtuningunlocklong,
      title={Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key}, 
      author={Yingda Chen and Xingjun Wang and Jintao Huang and Yunlin Mao and Daoze Zhang and Yuze Zhao},
      year={2024},
      eprint={2410.10210},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.10210}, 
}
```

1. 量子位文章：[666条数据教会AI写万字长文！模型数据集都开源](https://mp.weixin.qq.com/s/LvWUSgIRO5HI5YSDRz7SxA)
2. Tech report: [Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key](https://arxiv.org/pdf/2410.10210)