How to get Entity with their corresponding value
How to get Entity with their corresponding value
I am using def process(text, prompt, threshold=0.5):
input_ = f"{prompt}\n{text}"
results = nlp(input_)
processed_results = []
prompt_length = len(prompt)
for result in results:
if result['score'] < threshold:
continue
start = result['start'] - prompt_length
if start < 0:
continue
end = result['end'] - prompt_length
span = text[start:end]
entity_type = result['entity_group']
processed_result = {
'entity': entity_type,
'span': span,
'start': start,
'end': end,
'score': result['score']
}
processed_results.append(processed_result)
return processed_results
tokenizer = AutoTokenizer.from_pretrained("knowledgator/UTC-DeBERTa-large")
model = AutoModelForTokenClassification.from_pretrained("knowledgator/UTC-DeBERTa-large")
# Setting device to CPU
device = -1 # -1 corresponds to CPU in Hugging Face's pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy='first', device=device)
To get entities that represent particular classes, you can construct your own prompt like here:
prompt = """Identify the following entity classes in the text:
computer
Text:
"""
text = """Apple was founded as Apple Computer Company on April 1, 1976, by Steve Wozniak, Steve Jobs (1955β2011) and Ronald Wayne to develop and sell Wozniak's Apple I personal computer.
It was incorporated by Jobs and Wozniak as Apple Computer, Inc. in 1977. The company's second computer, the Apple II, became a best seller and one of the first mass-produced microcomputers.
Apple went public in 1980 to instant financial success."""
results = process(text, prompt)
print(results)
I recommend writing a simple function like this, it will extract all entities that belong to classes you put:
def ner(class_, text):
prompt = f"""Identify entities in the text having the following classes:
{class_}
Text:
"""
results = process(text, prompt)
return results
ner(class_ = 'company', text = text)
To see more advanced implementation, you can visit our Gradio Space.
We are working on the framework to simplify usage, covering more use-cases.