Justcode's picture
Update README.md
63f3bb8
|
raw
history blame
11.1 kB
---
license: gpl-3.0
metrics:
- rouge
language:
- zh
pipeline_tag: question-answering
---
# Ziya-Reader-13B-v1.0
# 姜子牙系列模型
- [Ziya-LLaMA-13B-v1.1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1.1)
- [Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1)
- [Ziya-LLaMA-7B-Reward](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-7B-Reward)
- [Ziya-LLaMA-13B-Pretrain-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1)
- [Ziya-BLIP2-14B-Visual-v1](https://huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1)
- [Ziya-Writing-LLaMa-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-Writing-LLaMa-13B-v1)
- [Ziya-Coding-15B-v1](https://huggingface.co/IDEA-CCNL/Ziya-Coding-15B-v1)
- [Ziya-Coding-34B-v1.0](https://huggingface.co/IDEA-CCNL/Ziya-Coding-34B-v1.0)
## 简介 Brief Introduction
Ziya-Reader-13B-v1.0是一个知识问答模型,给定问题和知识文档可以准确回答问题,用于多文档或单文档问答。该模型具有8k的上下文窗口,相比其他具有更长窗口的模型,我们在多个长文本任务的评测中胜出。包括多文档问答、合成任务(文档检索)长文本摘要。
该模型主要面向知识库问答、检索问答、电商客服等场景,在私域知识问答中有着不错的效果,能广泛应用于法律、金融、医疗等垂直领域。因为它解决了多文档问答中当正确信息不在首个或末尾文档中时,回答准确率大幅降低的问题。
另外,模型的通用能力同样出众,可以进行通用问答。它在我们的通用能力评估集上的效果超过了Ziya-Llama-13B-v1.1.
"Ziya-Reader-13B-v1.0" is a knowledge question-answering model. It can accurately answer questions given questions and knowledge documents, and is suitable for both multi-document and single-document question-answering. The model has an 8k context window, and compared to models with longer windows, we have achieved victory in evaluations across multiple long-text tasks.
The tasks include multi-document question-answering, synthetic tasks (document retrieval), and long-text summarization.
Additionally, the model also demonstrates excellent generalization capabilities, enabling it to be used for general question-answering. Its performance on our general ability evaluation set surpassed that of Ziya-Llama-13B.
它基于13B的Llama2训练,在数十万通用数据和检索问答数据上进行微调得到。
## 评估结果 Evaluation
Longbench Chinese
|model|Multi-doc QA(%)| Synthetic task(%) | Summarization |
|:---:|:---:|:---:|:---:|
|GPT3.5-turbo-16k | 28.7 | 77.5 |16.0 |
|Longchat-v1.5-7B-32k |19.5|7.6|9.9|
|Xgen-7B-8k| 11.0| 3.5| 2.2 |
|InternlM-7B-8k | 16.3|0.9|12.4|
|ChatGLM2-6B-32k|37.6|64.5|16.2|
|Vicuna-v1.5-7B-16k|19.3|5.0|15.1|
|Ziya-Reader-13B-v1.0| **42.8**| **66.0**|**15.3**|
Multi-doc QA是多文档问答任务,给定问题和多个文档,根据其中含有正确信息的文档回答问题。该任务衡量模型的相关性判断和记忆力,以及问答的能力。在该任务上Ziya-Reader-13B-v1.0大幅领先所有模型,包括更长窗口的模型。
Synthetic task是合成的相关文档查找任务,给定一个摘要,从众多文档中找出与它对应文档。该任务衡量模型的语义匹配能力。在该任务上,我们的模型超越了所有开源模型,达到66%。
Summarization是长文本摘要任务,给定包含多个说话人的会议记录,生成出超长上下文的会议总结。在该任务上我们的模型非常有竞争力,在只有8k的上下文窗口情况下,与16k或更长窗口的模型差距不到1%,在8k窗口中最强。
"Multi-doc QA" is a multi-document question-answering task, where given a question and multiple documents, the model answers the question based on the documents that contain relevant information. This task measures the model's ability in relevance judgment, memory, and question-answering skills.
"Synthetic task" is a synthetic document retrieval task, where given a summary, the goal is to find the corresponding document from a large number of documents. This task evaluates the model's semantic matching ability.
"Summarization" is a long-text summarization task, where given meeting records containing multiple speakers, the model generates a meeting summary with an extremely long context.
|model|LongBench 中文Multi-doc QA(%)|LongBench 中文Multi-doc QA shuffled(%) |
|:---|:---:|:---:|
|gpt3.5-turbo-16k | 28.7 | 23.1|
|chatGLM2-32k | 34.3 | 20.3 |
|Baichuan-13B-Chat2 | 32.4 | 27.2 |
|Ziya-Reader-13B-v1.0| **42.8** | **40.9**|
我们发现Multi-doc QA中的文档都按照相关性从高到低排列,正确答案往往在第一或前几个,并不能反映模型的相关性判断能力。因此我们对该测试集打乱文档的顺序,再测试各个模型的效果。结果发现目前大多数模型的效果均显著下降,从5%到17%不等,而我们的模型非常鲁棒,降幅不到2%。
We found that the documents in Multi-doc QA were arranged in descending order of relevance, with the correct answer often in the first or early positions, which did not truly reflect the model's ability in relevance judgment. Therefore, we shuffled the document order in this test set and evaluated the performance of various models. The results showed a significant decrease in performance for most models, ranging from 5% to 17%. In contrast, our model demonstrated high robustness with a decrease of less than 2%.
## 模型分类 Model Taxonomy
| 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra |
| :----: | :----: | :----: | :----: | :----: | :----: |
| 问答QA,阅读理解MRC| AGI模型 | 姜子牙 Ziya | Llama2 | 13B | Chinese |
## 模型信息 Model Information
我们使用了位置插值(PI)的方式,在精选的长文档语料上进行微调,扩展上下文到8k大小。其次,模型靠数据喂养,我们从近千万数据中筛选高质量数据,仅用层层过滤的10万量级的数据即可将一个平平无奇的模型培养成知识问答小钢炮。另外,我们为搜索任务量身定做了特殊的任务,精心制作了数据,让模型学会从中寻找相关文档并回答问题。
更多信息请阅读我们的公众号文章[姜子牙大模型系列 | 为知识检索而生,Ziya-Reader开源,多个长文本中文任务第一](https://mp.weixin.qq.com/s/ucrvoTKBgQZZJxbr2NFP6g)
Please read our public release article for more details[姜子牙大模型系列 | 为知识检索而生,Ziya-Reader开源,多个长文本中文任务第一](https://mp.weixin.qq.com/s/ucrvoTKBgQZZJxbr2NFP6g)
## Usage
### 环境
pip install transformers=4.31.0
### example
通用问答时,直接在问题前后加"<human>:"和"\n<bot>:"即可。
进行阅读理解类问答时:
问题请放在前面,然后放上下文(知识文档),instruction放到最后。多个检索结果时,每个检索结果用”<eod>\n“分隔,开头使用方括号标识序号。如"[1] xxxxxxx<eod>\n"。
生成结果偶尔会有“根据上面编号为xx的信息”,真正答案从“我的答案是”后开始,解码时请截断前面语句。
dtype:Bfloat16
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
device = torch.device("cuda")
prompt='<human>: 给定问题:交强险过期不上路会不会被罚?\n 检索结果:[1] 交强险过期不上路会不会
被罚|法律分析:由于交强险是由保险公司对被保险机动车发生道路交通事故造成受害人(不包括本车人员和被保险人)的人身伤亡、财产损失,在责任限额内>予以赔偿的强制性责任保险。因此一旦交强险到期没续费,发生事故车主还会面临巨额赔偿。车险到期未交有处罚。法律依据:《机动车交通事故责任强制保
险条例》 第三十八条 机动车所有人、管理人未按照规定投保机动车交通事故责任强制保险的,由公安机关交通管理部门扣留机动车,通知机动车所有人、管
理人依照规定投保,处依照规定投保最低责任限额应缴纳的保险费的2倍罚款。 机动车所有人、管理人依照规定补办机动车交通事故责任强制保险的,应当及
时退还机动车。<eod>\n请阅读理解上面多个检索结果,正确地回答问题。只能根据相关的检索结果或者知识回答,禁止编造;如果没有相关结果,请回答“都
不相关,我不知道”。\n<bot>:'
model_path="IDEA-CCNL/Ziya-Reader-13B-v1.0"
model = AutoModelForCausalLM.from_pretrained(model_path,torch_dtype=torch.bfloat16).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generate_ids = model.generate(
input_ids,
max_new_tokens=512,
do_sample = True,
top_p = 0.8,
temperature = 0.85,
repetition_penalty=1.,
eos_token_id=tokenizer.encode("</s>"),
)
output = tokenizer.batch_decode(generate_ids)[0]
print(output)
'''预测结果:对于问题“交强险过期不上路会不会被罚?”,根据上面的编号为1的信息,我的答案是是的,交强险过期不上路会
被罚。根据《机动车交通事故责任强制保险条例》,机动车所有人、管理人未按照规定投保机动车交通事故责任强制保险的,由公安机关交通管理部门扣留机
动车,通知机动车所有人、管理人依照规定投保,处依照规定投保最低责任限额应缴纳的保险费的2倍罚款。因此,交强险过期不上路会被罚,车主需要及时补办机动车交通事故责任强制保险,以避免被罚款。"
'''
```
## 引用 Citation
如果您在您的工作中使用了我们的模型,可以引用我们的[论文](https://arxiv.org/abs/2210.08590):
If you are using the resource for your work, please cite our [paper](https://arxiv.org/abs/2210.08590):
```text
@article{fengshenbang,
author = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
title = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
journal = {CoRR},
volume = {abs/2209.02970},
year = {2022}
}
```
You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
欢迎引用我们的[网站](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
```text
@misc{Fengshenbang-LM,
title={Fengshenbang-LM},
author={IDEA-CCNL},
year={2021},
howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}
```