|
--- |
|
license: llama2 |
|
language: |
|
- en |
|
- ar |
|
metrics: |
|
- accuracy |
|
- f1 |
|
library_name: transformers |
|
--- |
|
|
|
# llama-7b-v2-Receipt-Key-Extraction |
|
|
|
llama-7b-v2-Receipt-Key-Extraction is a 7 billion parameter based on LLamA v1 |
|
|
|
[AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification](https://arxiv.org/abs/2309.09800) |
|
|
|
## Uses |
|
|
|
The model is intended for research-only use in English and Arabic for key information extraction for items in receipts. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```bibtex |
|
# pip install -q transformers |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig |
|
|
|
try: |
|
if torch.backends.mps.is_available(): |
|
device = "mps" |
|
except: |
|
pass |
|
checkpoint = "abdoelsayed/llama-7b-v2-Receipt-Key-Extraction" |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint, model_max_length=512, |
|
padding_side="right", |
|
use_fast=False,) |
|
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) |
|
|
|
def generate_response(instruction, input_text, max_new_tokens=100, temperature=0.1, num_beams=4 , top_p=0.75, top_k=40): |
|
prompt = f"Below is an instruction that describes a task, paired with an input that provides further context.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
input_ids = inputs["input_ids"].to(device) |
|
generation_config = GenerationConfig( |
|
temperature=temperature, |
|
top_p=top_p, |
|
top_k=top_k, |
|
num_beams=num_beams, |
|
) |
|
with torch.no_grad(): |
|
outputs = model.generate(input_ids,generation_config=generation_config, max_new_tokens=max_new_tokens,return_dict_in_generate=True,output_scores=True,) |
|
outputs = tokenizer.decode(outputs.sequences[0]) |
|
return outputs.split("### Response:")[-1].strip().replace("</s>","") |
|
|
|
instruction = "Extract the class, Brand, Weight, Number of units, Size of units, Price, T.Price, Pack, Unit from the following sentence" |
|
input_text = "Americana Okra zero 400 gm" |
|
|
|
response = generate_response(instruction, input_text) |
|
print(response) |
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
## How to Cite |
|
|
|
Please cite this model using this format. |
|
|
|
```bibtex |
|
@misc{abdallah2023amurd, |
|
title={AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification}, |
|
author={Abdelrahman Abdallah and Mahmoud Abdalla and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt}, |
|
year={2023}, |
|
eprint={2309.09800}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |