ghh001 commited on
Commit
abcdcbf
1 Parent(s): 06323a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +150 -1
README.md CHANGED
@@ -43,7 +43,12 @@ license: cc-by-nc-sa-4.0
43
 
44
  ## What is OneKE?
45
 
46
- OneKE is a new bilingual knowledge extraction large model developed jointly by Zhejiang University and Ant Group, leveraging their years of accumulation in knowledge graph and natural language processing technology. Launched in 2024, the model employs schema-based polling instruction construction technology and is optimized to enhance the model's generalization capabilities for structured information extraction.
 
 
 
 
 
47
 
48
 
49
  <p align="center" width="100%">
@@ -259,6 +264,8 @@ Below are examples of instructions for various tasks:
259
  </details>
260
 
261
 
 
 
262
 
263
  ### Conversion of OneKE Instruction Format
264
 
@@ -462,6 +469,148 @@ for split_schema in split_schemas:
462
  </details>
463
 
464
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
465
  ## Evaluation
466
 
467
  To extract structured content from the output text and to assess it, please refer to [InstructKGC/README_CN.md/7. Evaluation](./InstructKGC/README_CN.md/#🧾-7评估).
 
43
 
44
  ## What is OneKE?
45
 
46
+
47
+ OneKE is a large-scale model framework for knowledge extraction jointly developed by Ant Group and Zhejiang University. It possesses the capability of generalized knowledge extraction in bilingual Chinese and English, across multiple domains and tasks, and provides comprehensive toolchain support. OneKE has contributed to the OpenKG open knowledge graph community in an open-source manner.
48
+
49
+ Knowledge construction based on unstructured documents has always been one of the key challenges for the large-scale implementation of knowledge graphs. The high fragmentation and unstructured nature of real-world information, along with the substantial disparities between extracted content and its natural language expression, often result in the suboptimal performance of large language models in information extraction tasks. Natural language text often contains ambiguities, polysemies, and metaphors due to implicit and long-distance context associations, posing significant challenges for knowledge extraction tasks. In response to these issues, Ant Group and Zhejiang University leveraged their years of expertise in knowledge graphs and natural language processing to jointly construct and upgrade the capabilities of Ant's large-scale model "BaiLing" in the field of knowledge extraction. They released the bilingual knowledge extraction framework OneKE which included a version based on full parametric fine-tuning of Chinese-Alpaca-2-13B. Evaluation metrics show that OneKE has achieved relatively good performance on several fully supervised and zero-shot entity/relation/event extraction tasks.
50
+
51
+ The unified knowledge extraction framework has wide application scenarios and can significantly reduce the construction costs of domain-specific knowledge graphs. By extracting structured knowledge from massive datasets to construct high-quality knowledge graphs and establish logical associations between knowledge elements, interpretable inference and decision-making can be realized. It can also enhance large models by mitigating hallucination and boosting stability, accelerating the vertical domain applications of large models. For example, in the medical field, knowledge extraction can be used to convert doctors' experience into structured, rule-based management, building controlled auxiliary diagnostics, and medical Q&A systems. In the financial sector, it can extract financial indicators, risk events, causal logic, and industry chains for automated financial report generation, risk prediction, and industry chain analysis. In the public sector, it can facilitate knowledge-based management of government regulations, enhancing the efficiency and accuracy of public services.
52
 
53
 
54
  <p align="center" width="100%">
 
264
  </details>
265
 
266
 
267
+ > Note: In consideration of the complexity of information extraction within specific domains and the high reliance on prompts, we support the integration of Schema descriptions and examples in the instructions to enhance the effectiveness of extraction tasks. For details, refer to **`Customized Schema Description Instructions`** and **`Customized Example Instructions`**. Please understand that due to the limited scale of the model, the model output is prompt-dependent and different prompts may yield inconsistent results.
268
+
269
 
270
  ### Conversion of OneKE Instruction Format
271
 
 
469
  </details>
470
 
471
 
472
+ ### Customized Example Instructions
473
+
474
+ Given that example instances can often be lengthy, and due to the limited maximum length of model training, too many examples may inversely affect model performance. Therefore, we suggest providing 2 examples: one positive and one negative, while keeping the number of schemas to one.
475
+
476
+ ```json
477
+ {
478
+ "instruction": "You are an expert in entity extraction. Please extract entities from the input that fit the defined schema; return an empty list for non-existent entity types. Please respond in the format of a JSON string. You may refer to the example to guide your extraction.",
479
+ "schema": [
480
+ "Biomarker"
481
+ ],
482
+ "example": [
483
+ {
484
+ "input": "Diagnostic criteria for CKD include: 1. Any of the following indicators persisting for more than 3 months; and meeting at least one criterion.(1) Signs of renal damage: Albuminuria [Albumin excretion rate (AER)≥30mg/24h; Albumin to creatinine ratio (ACR)≥3mg/mmol]; abnormal urinary sediment; tubular pathology; histological anomalies; structural abnormities found in imaging; history of kidney transplantation.(2) Decline in glomerular filtration rate: eGFR≤60ml·min-1·1.73m-2",
485
+ "output": {
486
+ "Biomarker": [
487
+ "Albumin excretion rate (AER)",
488
+ "Albumin to creatinine ratio (ACR)",
489
+ "Glomerular filtration rate",
490
+ "eGFR"
491
+ ]
492
+ }
493
+ },
494
+ {
495
+ "input": "Application of DPP-4 inhibitors in specific populations",
496
+ "output": {
497
+ "Biomarker": []
498
+ }
499
+ }
500
+ ],
501
+ "input": "Currently, all sulfonylurea drugs' leaflets list severe liver dysfunction as a contraindication. Alanine transaminase (ALT)> 3 times the upper limit of the reference value can serve as a sensitive and specific indicator of liver damage. If ALT>8-10 times the upper limit of the reference value or ALT>3 times with total serum bilirubin (TBIL)>2 times the reference value, it is considered a specific predictor of severe liver damage, indicating substantial injury to hepatic parenchymal cells; sulfonylureas should be contraindicated at this stage. Clinically, patients with decompensated liver cirrhosis accompanied by hepatic encephalopathy, ascites, or coagulation disorders should avoid this class of drugs to prevent hypoglycemia."
502
+ }
503
+ ```
504
+
505
+
506
+ <details>
507
+
508
+ <summary><b>Relationship Extraction (RE) Example Instruction</b></summary>
509
+ ```json
510
+ {
511
+ "instruction": "You are an expert specialized in relationship extraction. Please extract from the input the defined relation triples according to the schema; return an empty list for non-existent relations. Please respond in the format of a JSON string. You may refer to the example for guidance on extraction.",
512
+ "schema": [
513
+ "Disease Staging and Typing"
514
+ ],
515
+ "example": [
516
+ {
517
+ "input": "The foundational treatment of diabetes includes both education and management, as well as diet and exercise. A lack of knowledge in diabetes prevention and control is the primary reason for poor blood sugar management. Paying attention to the education and management of elderly patients is an important measure to improve the treatment level of diabetes.",
518
+ "output": {
519
+ "Disease Staging and Typing": []
520
+ }
521
+ },
522
+ {
523
+ "input": "Metabolites of glipizide have no hypoglycemic effect and are mostly excreted through feces, with only 5.0% excreted by the kidneys, thus are less affected by renal function. However, large clinical trials in patients with chronic kidney disease are limited. There have been studies observing the use of glipizide in patients with GFR10~50 ml min-1.(1.73m2)-1, but the trial designs are not perfect. Glipizide can be used in patients with stages 1 to 3 chronic kidney disease without dose adjustment; caution is advised in stage 4; and it is contraindicated in stage 5.",
524
+ "output": {
525
+ "Disease Staging and Typing": [
526
+ {
527
+ "subject": "Chronic kidney disease",
528
+ "object": "Chronic"
529
+ },
530
+ {
531
+ "subject": "Chronic kidney disease",
532
+ "object": "Chronic"
533
+ },
534
+ {
535
+ "subject": "Chronic kidney disease",
536
+ "object": "stages 1 to 3"
537
+ },
538
+ {
539
+ "subject": "Chronic kidney disease",
540
+ "object": "stage 4"
541
+ },
542
+ {
543
+ "subject": "Chronic kidney disease",
544
+ "object": "stage 5"
545
+ }
546
+ ]
547
+ }
548
+ }
549
+ ],
550
+ "input": "(2)NSAIDs: This includes both non-selective cyclooxygenase (COX) inhibitors and COX-2 inhibitors. If there are no contraindications, early and ample use of fast-acting NSAID formulations is recommended. Non-selective COX inhibitors primarily have gastrointestinal adverse reactions such as ulcers, perforations, and upper gastrointestinal bleeding, hence COX-2 inhibitors, which can reduce GI reactions by 50%, may be used for those intolerant to non-selective COX inhibitors. Active gastrointestinal ulcers/bleeding or a history of recurrent gastrointestinal ulcers/bleeding is a contraindication for all NSAIDs use. COX-2 inhibitors may increase the risk of cardiovascular events and should be avoided in patients with myocardial infarction or heart failure. Kidney function monitoring is required during the use of NSAIDs, and their use is not recommended in patients with severe chronic kidney disease (stages G4 to G5) who are not undergoing dialysis."
551
+ }
552
+ ```
553
+
554
+ </details>
555
+
556
+
557
+
558
+ <details>
559
+
560
+ <summary><b>Event Extraction (EE) Example Instruction</b></summary>
561
+ ```json
562
+ {
563
+ "instruction": "You are an expert specialized in event extraction. Please extract events from the input according to the defined schema; return an empty list for non-existent events, and 'NAN' for non-existent arguments. If an argument has multiple values, please return a list. Respond in the format of a JSON string. You may refer to the example for extraction guidance.",
564
+ "schema": [
565
+ {
566
+ "event_type": "Corporate Financing",
567
+ "trigger": true,
568
+ "arguments": [
569
+ "Disclosure Time",
570
+ "Investee",
571
+ "Financing Round",
572
+ "Lead Investor",
573
+ "Event Time",
574
+ "Investor",
575
+ "Financing Amount"
576
+ ]
577
+ }
578
+ ],
579
+ "example": [
580
+ {
581
+ "input": "Raise 2.5 billion yuan for expansion due to the 'three highs' condition of Joyson Electronics: high pledges, high goodwill, high debt\nReporter Zhang Jiazhen, from Beijing\nNingbo Joyson Electronic Corporation (hereinafter referred to as 'Joyson Electronics', 600699.SH), which holds billion-level big orders, is actively raising funds to expand production capacity to ease the increasingly pressing bottleneck of production capacity saturation.\nRecently, Joyson Electronics announced that it has received the 'Feedback Notice' from the China Securities Regulatory Commission, and its private stock offering is a step closer to approval.",
582
+ "output": {
583
+ "Corporate Financing": [
584
+ {
585
+ "trigger": "Raise",
586
+ "arguments": {
587
+ "Disclosure Time": "NAN",
588
+ "Investee": "Ningbo Joyson Electronic Corporation",
589
+ "Financing Round": "NAN",
590
+ "Lead Investor": "NAN",
591
+ "Event Time": "NAN",
592
+ "Investor": "NAN",
593
+ "Financing Amount": "2.5 billion yuan"
594
+ }
595
+ }
596
+ ]
597
+ }
598
+ },
599
+ {
600
+ "input": "NIO stock falls to 13% before market; NIO reports over 3.2 billion loss in Q2\nOriginal Title: NIO stock falls to 13% before market; NIO reports over 3.2 billion loss in Q2\nNIO's stock price turned from a rise to a fall before market, falling to 13%. NIO released its Q2 earnings today, followed by the announcement of the cancellation of the earnings conference call originally scheduled for today.\nThe earnings report showed that NIO achieved a revenue of 1.508 billion yuan in the second quarter, exceeding market expectations of 1.309 billion yuan, compared to 46 million yuan in the same period last year; The net loss attributable to shareholders in the second quarter was 3.285 billion yuan, higher than the market expected loss of 2.944 billion yuan, compared to a loss of 6.11 billion yuan in the same period last year.",
601
+ "output": {
602
+ "Corporate Financing": []
603
+ }
604
+ }
605
+ ],
606
+ "input": "【Exclusive】The 11th in five years, Codemao announces completion of C+ round financing of 250 million yuan\nJiemodui, April 17th - Today, Codemao announced the completion of a C+ round of financing worth 250 million yuan.\nThis comes five months after completing a C round financing of 400 million yuan last year, which is the new round of 'ammunition' added by Codemao.\nThe round was led by China Merchants International, with Bohai Capital, an equity investment fund under Bank of China Group, and existing shareholders Yueke Xintai and Shengyu Investment following suit."
607
+ }
608
+ ```
609
+
610
+ </details>
611
+
612
+
613
+
614
  ## Evaluation
615
 
616
  To extract structured content from the output text and to assess it, please refer to [InstructKGC/README_CN.md/7. Evaluation](./InstructKGC/README_CN.md/#🧾-7评估).