ghh001 commited on
Commit
6f6cb5c
1 Parent(s): aa01e48

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -28,7 +28,7 @@ OneKE is a new bilingual knowledge extraction large model developed jointly by Z
28
 
29
 
30
  ## How is OneKE trained?
31
- OneKE mainly focuses on schema-generalizable information extraction. Due to issues such as non-standard formats, noisy data, and lack of diversity in existing extraction instruction data, OneKE adopted techniques such as normalization and cleaning of extraction instructions, difficult negative sample collection, and schema-based polling instruction construction, as shown in the illustration. For more detailed information, refer to the paper "[IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus](https://arxiv.org/abs/2402.14710) [[Github](https://github.com/zjunlp/IEPile)]".
32
 
33
  The zero-shot generalization comparison results of OneKE with other large models are as follows:
34
  * `NER-en`: CrossNER_AI, CrossNER_literature, CrossNER_music, CrossNER_politics, CrossNER_science
@@ -268,7 +268,7 @@ split_num_mapper = {
268
  ```
269
 
270
 
271
- Since predicting all schemas in the label set at once is too challenging and not easily scalable, OneKE uses a polling approach during training. It divides the number of schemas asked in the instructions, querying a fixed number of schemas at a time. Hence, if the label set of a piece of data is too long, it will be split into multiple instructions that the model will address in turns.
272
 
273
  **schema格式**:
274
 
@@ -281,7 +281,7 @@ EEA: [{"event_type": "Finance/Trading - Interest Rate Hike", "arguments": ["Time
281
  ```
282
 
283
 
284
- Below is a simple Polling Instruction Generation script:
285
 
286
  ```python
287
  def get_instruction(language, task, schema, input):
@@ -359,7 +359,7 @@ for split_schema in split_schemas:
359
 
360
 
361
  <details>
362
- <summary><b>Event Extraction (EE) Explanation Instructions</b></summary>
363
 
364
  ```json
365
  {
@@ -407,7 +407,7 @@ for split_schema in split_schemas:
407
 
408
 
409
  <details>
410
- <summary><b>Knowledge Graph Construction (KGC) Explanation Instructions</b></summary>
411
 
412
  ```json
413
  {
 
28
 
29
 
30
  ## How is OneKE trained?
31
+ OneKE mainly focuses on schema-generalizable information extraction. Due to issues such as non-standard formats, noisy data, and lack of diversity in existing extraction instruction data, OneKE adopted techniques such as normalization and cleaning of extraction instructions, difficult negative sample collection, and schema-based batched instruction construction, as shown in the illustration. For more detailed information, refer to the paper "[IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus](https://arxiv.org/abs/2402.14710) [[Github](https://github.com/zjunlp/IEPile)]".
32
 
33
  The zero-shot generalization comparison results of OneKE with other large models are as follows:
34
  * `NER-en`: CrossNER_AI, CrossNER_literature, CrossNER_music, CrossNER_politics, CrossNER_science
 
268
  ```
269
 
270
 
271
+ Since predicting all schemas in the label set at once is too challenging and not easily scalable, OneKE uses a batched approach during training. It divides the number of schemas asked in the instructions, querying a fixed number of schemas at a time. Hence, if the label set of a piece of data is too long, it will be split into multiple instructions that the model will address in turns.
272
 
273
  **schema格式**:
274
 
 
281
  ```
282
 
283
 
284
+ Below is a simple Batched Instruction Generation script:
285
 
286
  ```python
287
  def get_instruction(language, task, schema, input):
 
359
 
360
 
361
  <details>
362
+ <summary><b>Event Extraction (EE) Description Instructions</b></summary>
363
 
364
  ```json
365
  {
 
407
 
408
 
409
  <details>
410
+ <summary><b>Knowledge Graph Construction (KGC) Description Instructions</b></summary>
411
 
412
  ```json
413
  {