--- language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - llama - trl base_model: unsloth/llama-3-8b-Instruct-bnb-4bit --- # Uploaded model - **Developed by:** AmaanUsmani - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3-8b-Instruct-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth) ## How to run inference ## Please note the code for downloading model and running inference is not optimized, it will be done in the future !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes scikit-learn scipy auto-gptq optimum bitsandbytes joblib threadpoolctl from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from peft import prepare_model_for_kbit_training from peft import LoraConfig, get_peft_model import transformers from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. model, tokenizer = FastLanguageModel.from_pretrained( model_name = "AmaanUsmani/Llama3-8b-DynamicChat-4bit", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) intstructions_string = f"""You're a conversational agent designed to engage users in dynamic interactions. Your goal is to facilitate more meaningful exchanges by enhancing the model's understanding of user input. You should aim to create an environment where users feel heard, understood, and engaged in ongoing dialogue. As long as the user's question doesn't include any personal details or context related to the user, do not ask questions back. If the user's question involves more context, first provide general information or advice and then ask a follow up question regarding the additional context needed. Please respond to the following comment. """ prompt_template = lambda comment: f'''<|begin_of_text|><|start_header_id|>system<|end_header_id|>{intstructions_string}<|eot_id|><|start_header_id|>user<|end_header_id|>\n{comment}<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n''' comment = "I want to learn how to swim" prompt = prompt_template(comment) model.eval() inputs = tokenizer(prompt, return_tensors="pt") text_streamer = TextStreamer(tokenizer) outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=500) response = tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant\n")[-1].strip() print(response)