Ellight
/

gemma-2b-bnb-4bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Ellight commited on May 7

Commit

9361b08

•

1 Parent(s): 6969a91

Update README.md

Files changed (1) hide show

README.md +29 -1

README.md CHANGED Viewed

@@ -23,4 +23,32 @@ This gemma model was trained 2x faster with [Unsloth](https://github.com/unsloth
 # Hindi-Gemma-2B-instruct (Instruction-tuned)
-Hindi-Gemma-2B-instruct is an instruction-tuned Hindi large language model (LLM) with 2 billion parameters, and it is based on Gemma 2B.

 # Hindi-Gemma-2B-instruct (Instruction-tuned)
+Hindi-Gemma-2B-instruct is an instruction-tuned Hindi large language model (LLM) with 2 billion parameters, and it is based on Gemma 2B.
+# TO do inference using the LORA adapters
+from unsloth import FastLanguageModel
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name = "Ellight/gemma-2b-bnb-4bit", # YOUR MODEL YOU USED FOR TRAINING
+    max_seq_length = max_seq_length,
+    dtype = dtype,
+    load_in_4bit = load_in_4bit,
+)
+FastLanguageModel.for_inference(model) # Enable native 2x faster inference
+alpaca_prompt = """
+### Instruction:
+{}
+### Response:
+{}"""
+inputs = tokenizer(
+[
+    alpaca_prompt.format(
+        "शतरंज बोर्ड पर कितने वर्ग होते हैं?", # instruction
+        "", # output - leave this blank for generation!
+    )
+], return_tensors = "pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
+tokenizer.batch_decode(outputs)