wenge-research
/

yayi-uie

Text Generation

Transformers

PyTorch

YAYIUIE

custom_code

Model card Files Files and versions Community

wenge-research commited on Dec 14, 2023

Commit

fc95cb0

•

1 Parent(s): 5a397cb

Update README.md

Browse files

Files changed (1) hide show

README.md +14 -21

README.md CHANGED Viewed

@@ -17,7 +17,8 @@ license: apache-2.0
 ## 介绍/Introduction
-雅意信息抽取统一大模型 (YAYI-UIE)在百万级人工构造的高质量信息抽取数据上进行指令微调得到，统一训练信息抽取任务包括命名实体识别（NER），关系抽取（RE）和事件抽取（EE），实现通用、安全、金融、生物、医疗、商业、个人、车辆、电影、工业、餐厅、科学等场景下结构化抽取。
 通过雅意IE大模型的开源为促进中文预训练大模型开源社区的发展，贡献自己的一份力量，通过开源，与每一位合作伙伴共建雅意大模型生态。
@@ -51,15 +52,19 @@ print(tokenizer.decode(response[0],skip_special_tokens=True))
 #### 指令样例/Sample Prompts
-1. 实体抽取任务
 ```
 文本：xx
 【实体抽取】抽取文本中可能存在的实体，并以json{人物/机构/地点：[实体]}格式输出。
 ```
-2. 关系抽取任务
 ```
 文本：xx
 【关系抽取】已知关系列表是[注资,拥有,纠纷,自己,增持,重组,买资,签约,持股,交易]。根据关系列表抽取关系三元组，按照json[{'relation':'', 'head':'', 'tail':''}, ]的格式输出。
 ```
 ```
 文本：xx
@@ -69,20 +74,6 @@ print(tokenizer.decode(response[0],skip_special_tokens=True))
 ```
 文本：xx
 已知论元角色列表是[质押方,披露时间,质权方,质押物,质押股票/股份数量,事件时间,质押物所属公司,质押物占总股比,质押物占持股比]，请根据论元角色列表从给定的输入中抽取可能的论元，以json{角色:论元,}格式输出。
-```
-1. NER
-```
-Text:
-From the given text, extract all the entities and types. Please format the answer in json {person/organization/location：[entities]}.
-```
-2. RE
-```
-Text:
-From the given text, extract the possible head entities (subjects) and tail entities (objects) and give the corresponding relation triples.The relations are [country of administrative divisions,place of birth,location contains]. Output the result in json[{'relation':'', 'head':'', 'tail':''}, ].
-```
-3. EE
-```
 Text:
 Given the text and the role list [seller, place, beneficiary, buyer], identify event arguments and roles, provide your answer in the format of json{role:name}.
 ```
@@ -110,7 +101,7 @@ FewRe，Wiki-ZSL为英文数据集， SKE 2020，COAE2016，IPRE为中文数据
 FewRe and Wiki-ZSL are English datasets; SKE 2020, COAE2016 and IPRE are Chinese datasets
-| Model | FewRe | Wiki-ZSL | EN Average | SKE 2020 | COAE2016 | IPRE | ZH Average |
 | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
 | ChatGPT 3.5 | 9.96 | 13.14 | 11.55  24.47 | 19.31 | 6.73 | 16.84 |
 | ZETT(T5-small) | 30.53 | 31.74 | 31.14 | - | - | - | - |
@@ -145,11 +136,9 @@ EEA（事件论元抽取 Event Arguments Extraction）
 The chart illustrates the performance of our model on Chinese IE tasks in zero-shot setting.
-<div align="center">
-<br>
 ![零样本推理性能分布](./assets/zh-0shot.png)
-</div>
 ## 相关协议/Terms and Conditions
 #### 局限性/Limitations
 基于当前数据和基础模型训练得到的SFT模型，在效果上仍存在以下问题：
@@ -165,6 +154,10 @@ The SFT model, trained using the data and the base model, still faces the follow
 2. It struggles to effectively discern harmful instructions, potentially resulting in hazardous statements.
 3. The model's extraction capability needs improvement in scenarios involving paragraph-level texts.
 #### 免责声明/Disclaimer
 基于以上模型局限性，我们要求开发者仅将我们开源的代码、数据、模���及后续用此项目生成的衍生物用于研究目的，不得用于商业用途，以及其他会对社会带来危害的用途。请谨慎鉴别和使用雅意大模型生成的内容，请勿将生成的有害内容传播至互联网。若产生不良后果，由传播者自负。
 本项目仅可应用于研究目的，项目开发者不承担任何因使用本项目（包含但不限于数据、模型、代码等）导致的危害或损失。详细请参考免责声明。

 ## 介绍/Introduction
+雅意信息抽取统一大模型 (YAYI-UIE)在百万级人工构造的高质量信息抽取数据上进行指令微调，统一训练信息抽取任务包括命名实体识别（NER），关系抽取（RE）和事件抽取（EE），实现通用、安全、金融、生物、医疗、商业、
+个人、车辆、电影、工业、餐厅、科学等场景下结构化抽取。
 通过雅意IE大模型的开源为促进中文预训练大模型开源社区的发展，贡献自己的一份力量，通过开源，与每一位合作伙伴共建雅意大模型生态。
 #### 指令样例/Sample Prompts
+1. 实体抽取任务/NER tasks
 ```
 文本：xx
 【实体抽取】抽取文本中可能存在的实体，并以json{人物/机构/地点：[实体]}格式输出。
+Text:
+From the given text, extract all the entities and types. Please format the answer in json {person/organization/location：[entities]}.
 ```
+2. 关系抽取任务/RE tasks
 ```
 文本：xx
 【关系抽取】已知关系列表是[注资,拥有,纠纷,自己,增持,重组,买资,签约,持股,交易]。根据关系列表抽取关系三元组，按照json[{'relation':'', 'head':'', 'tail':''}, ]的格式输出。
+Text:
+From the given text, extract the possible head entities (subjects) and tail entities (objects) and give the corresponding relation triples.The relations are [country of administrative divisions,place of birth,location contains]. Output the result in json[{'relation':'', 'head':'', 'tail':''}, ].
 ```
 ```
 文本：xx
 ```
 文本：xx
 已知论元角色列表是[质押方,披露时间,质权方,质押物,质押股票/股份数量,事件时间,质押物所属公司,质押物占总股比,质押物占持股比]，请根据论元角色列表从给定的输入中抽取可能的论元，以json{角色:论元,}格式输出。
 Text:
 Given the text and the role list [seller, place, beneficiary, buyer], identify event arguments and roles, provide your answer in the format of json{role:name}.
 ```
 FewRe and Wiki-ZSL are English datasets; SKE 2020, COAE2016 and IPRE are Chinese datasets
+| Model | FewRel | Wiki-ZSL | EN Average | SKE 2020 | COAE2016 | IPRE | ZH Average |
 | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
 | ChatGPT 3.5 | 9.96 | 13.14 | 11.55  24.47 | 19.31 | 6.73 | 16.84 |
 | ZETT(T5-small) | 30.53 | 31.74 | 31.14 | - | - | - | - |
 The chart illustrates the performance of our model on Chinese IE tasks in zero-shot setting.
 ![零样本推理性能分布](./assets/zh-0shot.png)
 ## 相关协议/Terms and Conditions
 #### 局限性/Limitations
 基于当前数据和基础模型训练得到的SFT模型，在效果上仍存在以下问题：
 2. It struggles to effectively discern harmful instructions, potentially resulting in hazardous statements.
 3. The model's extraction capability needs improvement in scenarios involving paragraph-level texts.
+#### 开源协议/Open Source License
+本项目中的代码和数据依照 [Apache-2.0](LICENSE) 协议开源，社区使用YAYI UIE模型或其衍生品请遵循[Baichuan2](https://github.com/baichuan-inc/Baichuan2)的社区协议和商用协议。
+The code and data in this project is open-sourced under the [Apache-2.0](LICENSE) license. The use of YAYI-UIE model or its derivatives must adhere to [Baichuan2](https://github.com/baichuan-inc/Baichuan2)'s community and commercial Model License.
 #### 免责声明/Disclaimer
 基于以上模型局限性，我们要求开发者仅将我们开源的代码、数据、模���及后续用此项目生成的衍生物用于研究目的，不得用于商业用途，以及其他会对社会带来危害的用途。请谨慎鉴别和使用雅意大模型生成的内容，请勿将生成的有害内容传播至互联网。若产生不良后果，由传播者自负。
 本项目仅可应用于研究目的，项目开发者不承担任何因使用本项目（包含但不限于数据、模型、代码等）导致的危害或损失。详细请参考免责声明。