Upload 16 files

bb2619b verified 7 days ago

4.71 kB

	---
	language:
	- en
	- zh
	library_name: transformers
	tags:
	- Long Context
	- qwen2.5
	- qwen2

	---

	# MS-LongWriter-Qwen2.5-7B-Instruct

	<p align="center">
	🤖 <a href="https://modelscope.cn/datasets/swift/longwriter-6k-filtered" target="_blank">[LongWriter Dataset] </a> • 💻 <a href="https://github.com/THUDM/LongWriter" target="_blank">[Github Repo]</a> • 📃 <a href="https://arxiv.org/abs/2408.07055" target="_blank">[LongWriter Paper]</a> • 📃 <a href="https://arxiv.org/pdf/2410.10210" target="_blank">[Tech Report]</a>
	</p>

	MS-LongWriter-Qwen2.5-7B-Instruct is trained based on [https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct), and is capable of generating 10,000+ words at once.

	MS-LongWriter-Qwen2.5-7B-Instruct begins training directly from the Qwen2.5-7B-Instruct, while performing significant distillation on the [LongWriter-6k](https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k) to obtain 666 high-quality samples, which is [LongWriter-6k-filtered](https://modelscope.cn/datasets/swift/longwriter-6k-filtered)


	## Datasets
	1. [LongWriter-6k-filtered](https://modelscope.cn/datasets/swift/longwriter-6k-filtered), based on the [LongWriter-6k](https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k)
	2. [Magpie-Qwen2-Pro-200K-Chinese](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese) , random sampling 6k examples.
	3. [Magpie-Qwen2-Pro-200K-English](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-English) , random sampling 6k examples.


	## Model

	We use [ms-swift](https://github.com/modelscope/swift) to fine-tune the Qwen2-7B-Instruct model.

	1. Installation
	```python
	pip install ms-swift[llm]
	```

	2. Fine-tuning

	Envs:
	```text
	Nvidia A100(80G) x 4
	```

	Run:
	```shell
	CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
	--model_type qwen2_5-7b-instruct \
	--dataset longwriter-6k-filtered#666 qwen2-pro-zh#6660 qwen2-pro-en#6660 \
	--max_length 28672 \
	--num_train_epochs 2 \
	--eval_steps 200 \
	--batch_size 1 \
	--gradient_accumulation_steps 64 \
	--gradient_checkpointing true \
	--warmup_ratio 0.1 \
	--learning_rate 1e-5 \
	--sft_type full \
	--loss_name long-ce \
	--check_dataset_strategy warning \
	--save_only_model false \
	--save_total_limit -1 \
	--lazy_tokenize true \
	--dataloader_num_workers 1 \
	--resume_only_model true \
	--neftune_noise_alpha 5 \
	--use_flash_attn true
	```

	3. Fine-tuning with annealing

	The annealing strategy is used to improve the performance of the model during the post-training process.
	We leverage the LongWriter-6k-filtered dataset to fine-tune the model with annealing, and set the learning rate to 2e-6.
	Run:
	```shell
	CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
	--model_type qwen2_5-7b-instruct \
	--dataset longwriter-6k-filtered#666 \
	--max_length 28672 \
	--num_train_epochs 2 \
	--eval_steps 200 \
	--batch_size 1 \
	--gradient_accumulation_steps 64 \
	--gradient_checkpointing true \
	--warmup_ratio 0.1 \
	--learning_rate 2e-6 \
	--sft_type full \
	--loss_name long-ce \
	--check_dataset_strategy warning \
	--save_only_model false \
	--save_total_limit -1 \
	--lazy_tokenize true \
	--dataloader_num_workers 1 \
	--resume_only_model true \
	--neftune_noise_alpha 5 \
	--use_flash_attn true \
	--resume_from_checkpoint {previous-checkpoint-path}

	```

	Note:
	1. The `--resume_from_checkpoint` parameter is used to specify the path of the previous checkpoint. (see the step2)


	## Evaluation

	Refer to [LongWriter Evaluation](https://github.com/modelscope/evalscope/tree/main/evalscope/third_party/longbench_write) from the [EvalScope](https://github.com/modelscope/evalscope).


	## Reference

	If you find our work helpful, please consider citing our paper, and star our github repositories.

	```bib
	@misc{chen2024minimumtuningunlocklong,
	title={Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key},
	author={Yingda Chen and Xingjun Wang and Jintao Huang and Yunlin Mao and Daoze Zhang and Yuze Zhao},
	year={2024},
	eprint={2410.10210},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2410.10210},
	}
	```

	1. 量子位文章：[666条数据教会AI写万字长文！模型数据集都开源](https://mp.weixin.qq.com/s/LvWUSgIRO5HI5YSDRz7SxA)
	2. Tech report: [Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key](https://arxiv.org/pdf/2410.10210)