ghh001 commited on
Commit
78ee99d
1 Parent(s): 9ce7968

Update README_EN.md

Browse files
Files changed (1) hide show
  1. README_EN.md +17 -18
README_EN.md CHANGED
@@ -128,32 +128,31 @@ Here [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC
128
 
129
  | Name | Download | Quantity | Description |
130
  | ---------------------- | ------------------------------------------------------------ | -------- | ------------------------------------------------------------ |
131
- | InstructIE-train | [Google drive](https://drive.google.com/file/d/1VX5buWC9qVeVuudh_mhc_nC7IPPpGchQ/view?usp=drive_link) <br/> [HuggingFace](https://huggingface.co/datasets/zjunlp/KnowLM-IE) <br/> [Baidu Netdisk](https://pan.baidu.com/s/1xXVrjkinw4cyKKFBR8BwQw?pwd=x4s7) | 30w+ | InstructIE train set, which is constructed by weak supervision and may contain some noisy data |
132
- | InstructIE-valid | [Google drive](https://drive.google.com/file/d/1EMvqYnnniKCGEYMLoENE1VD6DrcQ1Hhj/view?usp=drive_link) <br/> [HuggingFace](https://huggingface.co/datasets/zjunlp/KnowLM-IE) <br/> [Baidu Netdisk](https://pan.baidu.com/s/11u_f_JT30W6B5xmUPC3enw?pwd=71ie) | 2000+ | InstructIE validation set |
133
- | InstructIE-test | [Google drive](https://drive.google.com/file/d/1WdG6_ouS-dBjWUXLuROx03hP-1_QY5n4/view?usp=drive_link) <br/> [HuggingFace](https://huggingface.co/datasets/zjunlp/KnowLM-IE) <br/> [Baidu Netdisk](https://pan.baidu.com/s/1JiRiOoyBVOold58zY482TA?pwd=cyr9) | 2000+ | InstructIE test set |
134
- | train.json, valid.json | [Google drive](https://drive.google.com/file/d/1vfD4xgToVbCrFP2q-SD7iuRT2KWubIv9/view?usp=sharing) | 5,000 | Preliminary training set and test set for the task "Instruction-Driven Adaptive Knowledge Graph Construction" in [CCKS2023 Open Knowledge Graph Challenge](https://tianchi.aliyun.com/competition/entrance/532080/introduction), randomly selected from instruct_train.json |
135
 
136
 
137
- - `InstrumentIE-train` contains two files: `InstrumentIE-zh.json` and `InstrumentIE-en.json`, each of which contains the following fields: `'id'` (unique identifier), `'cate'` (text category), `'entity'` and `'relation'` (triples) fields. The extracted instructions and output can be freely constructed through `'entity'` and `'relation'`.
138
- - `InstrumentIE-valid` and `InstrumentIE-test` are validation sets and test sets, respectively, including bilingual `zh` and `en`.
139
- - `train.json`: Same fields as `KnowLM-IE.json`, `'instruction'` and `'output'` have only one format, and extraction instructions and outputs can also be freely constructed through `'relation'`.
140
- - `valid.json`: Same fields as `train.json`, but with more accurate annotations achieved through crowdsour
 
 
 
 
141
 
142
 
143
  <details>
144
  <summary><b>Explanation of each field</b></summary>
145
 
146
 
147
- | Field | Description |
148
- | :---------: | :----------------------------------------------------------: |
149
- | id | Unique identifier |
150
- | cate | text topic of input (12 topics in total) |
151
- | input | Model input text (need to extract all triples involved within) |
152
- | instruction | Instruction for the model to perform the extraction task |
153
- | output | Expected model output |
154
- | entity | entities(entity, entity_type) |
155
- | relation | Relation triples(head, relation, tail) involved in the input |
156
-
157
 
158
  </details>
159
 
 
128
 
129
  | Name | Download | Quantity | Description |
130
  | ---------------------- | ------------------------------------------------------------ | -------- | ------------------------------------------------------------ |
131
+ | InstructIE | [Google drive](https://drive.google.com/file/d/1raf0h98x3GgIhaDyNn1dLle9_HvwD6wT/view?usp=sharing) <br/> [Baidu Netdisk](https://pan.baidu.com/s/1-u8bD85H1Otbzk-gjLxaFw?pwd=c1i6) | 20w+ | InstrumentIE dataset (bilingual in Chinese and English) |
 
 
 
132
 
133
 
134
+
135
+ The `InstructIE` dataset contains two core files: `InstructIE-zh.json` and `InstructIE-en.json`. Both files cover a range of fields that provide detailed descriptions of different aspects of the dataset:
136
+
137
+ - `'id'`: A unique identifier for each data entry, ensuring the independence and traceability of the data items.
138
+ - `'cate'`: The text's subject category, which provides a high-level categorical label for the content (there are 12 categories in total).
139
+ -'text ': The text to be extracted.
140
+ - `'relation'`: Represent **relationship triples**, respectively. These fields allow users to freely construct instructions and expected outputs for information extraction.
141
+
142
 
143
 
144
  <details>
145
  <summary><b>Explanation of each field</b></summary>
146
 
147
 
148
+ | Field | Description |
149
+ | ----------- | ---------------------------------------------------------------- |
150
+ | id | The unique identifier for each data point. |
151
+ | cate | The category of the text's subject, with a total of 12 different thematic categories. |
152
+ | input | The input text for the model, with the goal of extracting all the involved relationship triples. |
153
+ | instruction | Instructions guiding the model to perform information extraction tasks. |
154
+ | output | The expected output result of the model. |
155
+ | relation | Describes the relationship triples contained in the text, i.e., the connections between entities (head, relation, tail). |
 
 
156
 
157
  </details>
158