File size: 4,063 Bytes
b872ed3 2982506 ba92567 2982506 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
license: apache-2.0
---
# 1.Differences from knowlm-13b-zhixi
Compared to zjunlp/knowlm-13b-zhixi, zjunlp/knowlm-13b-ie exhibits slightly stronger practicality in information extraction but with a decrease in its general applicability.
zjunlp/knowlm-13b-ie samples around 10% of the data from Chinese-English information extraction datasets, which then undergo negative sampling. For instance, if dataset A contains labels [a, b, c, d, e, f], we first sample 10% of the data from A. For a given sample 's', it might only contain labels a and b. We randomly add relationships that it doesn't originally have, such as c and d, from the specified list of relation candidates. When encountering these additional relationships, the model might output text similar to 'NAN'.This method equips the model with the ability to generate 'NAN' outputs to a certain extent, enhancing its information extraction capability while weakening its generalization ability.
# 2.IE template
NER supports the following templates:
```python
entity_template_zh = {
0:'已知候选的实体类型列表:{s_schema},请你根据实体类型列表,从以下输入中抽取出可能存在的实体。请按照{s_format}的格式回答。',
1:'我将给你个输入,请根据实体类型列表:{s_schema},从输入中抽取出可能包含的实体,并以{s_format}的形式回答。',
2:'我希望你根据实体类型列表从给定的输入中抽取可能的实体,并以{s_format}的格式回答,实体类型列表={s_schema}。',
3:'给定的实体类型列表是{s_schema}\n根据实体类型列表抽取,在这个句子中可能包含哪些实体?你可以先别出实体, 再判断实体类型。请以{s_format}的格式回答。',
}
entity_int_out_format_zh = {
0:['"(实体,实体类型)"', entity_convert_target0],
1:['"实体是\n实体类型是\n\n"', entity_convert_target1],
2:['"实体:实体类型\n"', entity_convert_target2],
3:["JSON字符串[{'entity':'', 'entity_type':''}, ]", entity_convert_target3],
}
entity_template_en = {
0:'Identify the entities and types in the following text and where entity type list {s_schema}. Please provide your answerin the form of {s_format}.',
1:'From the given text, extract the possible entities and types . The types are {s_schema}. Please format your answerin the form of {s_format}.',
}
entity_int_out_format_en = {
0:['(Entity, Type)', entity_convert_target0_en],
1:["{'Entity':'', 'Type':''}", entity_convert_target1_en],
}
```
The schema and format are embedded in the template({s_schema}、{s_format}) and need to be specified by the user themselves.
Please refer to [ner_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/ner_template.py)、[re_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/re_template.py)、[ee_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/ee_template.py) for more details about the templates.
# 3.Convert script
We have provided a script at [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) to uniformly convert data into KnowLM instructions.
The [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) directory contains the expected data format for each task before executing convert.py
```bash
python kg2instruction/convert.py \
--src_path data/NER/sample.json \
--tgt_path data/NER/processed.json \
--schema_path data/NER/schema.json \
--language zh \
--task NER \
--sample 0 \
--all
```
# 4.Evaluate
We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
```bash
python kg2instruction/evaluate.py \
--standard_path data/NER/processed.json \
--submit_path data/NER/processed.json \
--task ner \
--language zh
```
|