llama-2-7b-ov

Model creator: Ojas Patil
Fine-tuned-model: fine-tuned-Llama-2-7b
Original model: Llama-2-7b

Description

This is Llama-2-7b model converted to the OpenVINO™ IR (Intermediate Representation) format.

Running Model Inference with Optimum Intel

Install packages required for using Optimum Intel integration with the OpenVINO backend:

pip install optimum[openvino]

Run model inference:

from transformers import AutoTokenizer
from optimum.intel.openvino import OVModelForCausalLM

model_name = "OjasPatil/intel-llama2-7b-ov"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = OVModelForCausalLM.from_pretrained(model_name)

message = "What is Intel OpenVINO?"
prompt = f"[INST] {message} [/INST]"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = base_model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True).replace(prompt+" ", "")
print(response)