File size: 2,711 Bytes
5984321
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
356c40c
 
 
 
 
5984321
 
 
 
 
 
 
 
356c40c
5984321
 
 
 
 
 
 
 
 
 
356c40c
5984321
356c40c
5984321
 
 
 
 
 
 
356c40c
f909dfc
5984321
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: llama2
language:
- en
- ar
metrics:
- accuracy
- f1
library_name: transformers
---

# llama-7b-v2-Receipt-Key-Extraction

llama-7b-v2-Receipt-Key-Extraction is a 7 billion parameter based on LLamA v1

[AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification](https://arxiv.org/abs/2309.09800)

## Uses

The model is intended for research-only use in English and Arabic for key information extraction for items in receipts.

## How to Get Started with the Model

Use the code below to get started with the model.

```bibtex
# pip install -q transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

try:
    if torch.backends.mps.is_available():
        device = "mps"
except:
    pass
checkpoint = "abdoelsayed/llama-7b-v2-Receipt-Key-Extraction"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint, model_max_length=512,
        padding_side="right",
        use_fast=False,)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

def generate_response(instruction, input_text, max_new_tokens=100, temperature=0.1,  num_beams=4 , top_p=0.75, top_k=40):
    prompt = f"Below is an instruction that describes a task, paired with an input that provides further context.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:"
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].to(device)
    generation_config = GenerationConfig(
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            num_beams=num_beams,
        )
    with torch.no_grad():
        outputs = model.generate(input_ids,generation_config=generation_config, max_new_tokens=max_new_tokens,return_dict_in_generate=True,output_scores=True,)
    outputs = tokenizer.decode(outputs.sequences[0])
    return outputs.split("### Response:")[-1].strip().replace("</s>","")

instruction = "Extract the class, Brand, Weight, Number of units, Size of units, Price, T.Price, Pack, Unit from the following sentence"
input_text = "Americana Okra zero 400 gm"

response = generate_response(instruction, input_text)
print(response)



```



## How to Cite

Please cite this model using this format.

```bibtex
@misc{abdallah2023amurd,
    title={AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification},
    author={Abdelrahman Abdallah and Mahmoud Abdalla and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt},
    year={2023},
    eprint={2309.09800},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```