DeDeckerThomas
commited on
Commit
β’
94ec415
1
Parent(s):
0a79f9e
Update README.md
Browse files
README.md
CHANGED
@@ -33,9 +33,12 @@ model-index:
|
|
33 |
type: midas/inspec
|
34 |
name: inspec
|
35 |
metrics:
|
36 |
-
- type:
|
37 |
value: 0.588
|
38 |
-
name: F1
|
|
|
|
|
|
|
39 |
---
|
40 |
# π Keyphrase Extraction Model: KBIR-inspec
|
41 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a document. Thanks to these keyphrases humans can understand the content of a text very quickly and easily without reading it completely. Keyphrase extraction was first done primarily by human annotators, who read the text in detail and then wrote down the most important keyphrases. The disadvantage is that if you work with a lot of documents, this process can take a lot of time β³.
|
@@ -104,22 +107,22 @@ extractor = KeyphraseExtractionPipeline(model=model_name)
|
|
104 |
```python
|
105 |
# Inference
|
106 |
text = """
|
107 |
-
Keyphrase extraction is a technique in text analysis where you extract the
|
108 |
-
important keyphrases from a document. Thanks to these keyphrases humans can
|
109 |
-
understand the content of a text very quickly and easily without reading it
|
110 |
-
completely. Keyphrase extraction was first done primarily by human annotators,
|
111 |
-
who read the text in detail and then wrote down the most important keyphrases.
|
112 |
-
The disadvantage is that if you work with a lot of documents, this process
|
113 |
can take a lot of time.
|
114 |
|
115 |
-
Here is where Artificial Intelligence comes in. Currently, classical machine
|
116 |
-
learning methods, that use statistical and linguistic features, are widely used
|
117 |
-
for the extraction process. Now with deep learning, it is possible to capture
|
118 |
-
the semantic meaning of a text even better than these classical methods.
|
119 |
-
Classical methods look at the frequency, occurrence and order of words
|
120 |
-
in the text, whereas these neural approaches can capture long-term
|
121 |
semantic dependencies and context of words in a text.
|
122 |
-
"""
|
123 |
|
124 |
keyphrases = extractor(text)
|
125 |
|
@@ -130,7 +133,8 @@ print(keyphrases)
|
|
130 |
```
|
131 |
# Output
|
132 |
['Artificial Intelligence' 'Keyphrase extraction' 'deep learning'
|
133 |
-
'features' '
|
|
|
134 |
```
|
135 |
|
136 |
## π Training Dataset
|
@@ -213,8 +217,8 @@ tokenized_dataset = dataset.map(preprocess_fuction, batched=True)
|
|
213 |
|
214 |
```
|
215 |
|
216 |
-
### Postprocessing
|
217 |
-
|
218 |
```python
|
219 |
# Define post_process functions
|
220 |
def concat_tokens_by_tag(keyphrases):
|
|
|
33 |
type: midas/inspec
|
34 |
name: inspec
|
35 |
metrics:
|
36 |
+
- type: F1 (Seqeval)
|
37 |
value: 0.588
|
38 |
+
name: F1 (Seqeval)
|
39 |
+
- type: F1@M
|
40 |
+
value: 0.564
|
41 |
+
name: F1@M
|
42 |
---
|
43 |
# π Keyphrase Extraction Model: KBIR-inspec
|
44 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a document. Thanks to these keyphrases humans can understand the content of a text very quickly and easily without reading it completely. Keyphrase extraction was first done primarily by human annotators, who read the text in detail and then wrote down the most important keyphrases. The disadvantage is that if you work with a lot of documents, this process can take a lot of time β³.
|
|
|
107 |
```python
|
108 |
# Inference
|
109 |
text = """
|
110 |
+
Keyphrase extraction is a technique in text analysis where you extract the
|
111 |
+
important keyphrases from a document. Thanks to these keyphrases humans can
|
112 |
+
understand the content of a text very quickly and easily without reading it
|
113 |
+
completely. Keyphrase extraction was first done primarily by human annotators,
|
114 |
+
who read the text in detail and then wrote down the most important keyphrases.
|
115 |
+
The disadvantage is that if you work with a lot of documents, this process
|
116 |
can take a lot of time.
|
117 |
|
118 |
+
Here is where Artificial Intelligence comes in. Currently, classical machine
|
119 |
+
learning methods, that use statistical and linguistic features, are widely used
|
120 |
+
for the extraction process. Now with deep learning, it is possible to capture
|
121 |
+
the semantic meaning of a text even better than these classical methods.
|
122 |
+
Classical methods look at the frequency, occurrence and order of words
|
123 |
+
in the text, whereas these neural approaches can capture long-term
|
124 |
semantic dependencies and context of words in a text.
|
125 |
+
""".replace("\n", " ")
|
126 |
|
127 |
keyphrases = extractor(text)
|
128 |
|
|
|
133 |
```
|
134 |
# Output
|
135 |
['Artificial Intelligence' 'Keyphrase extraction' 'deep learning'
|
136 |
+
'linguistic features' 'machine learning' 'semantic meaning'
|
137 |
+
'text analysis']
|
138 |
```
|
139 |
|
140 |
## π Training Dataset
|
|
|
217 |
|
218 |
```
|
219 |
|
220 |
+
### Postprocessing (Without Pipeline Function)
|
221 |
+
If you do not use the pipeline function, you must filter out the B and I labeled tokens. Each B and I will then be merged into a keyphrase. Finally, you need to strip the keyphrases to make sure all unnecessary spaces have been removed.
|
222 |
```python
|
223 |
# Define post_process functions
|
224 |
def concat_tokens_by_tag(keyphrases):
|