mm
/

gte-Qwen2-1.5B-instruct-gguf

Inference Endpoints

Model card Files Files and versions Community

mm commited on Jul 7

Commit

0e064ec

•

1 Parent(s): df21559

Update README.md

Files changed (1) hide show

README.md +52 -1

README.md CHANGED Viewed

@@ -2,4 +2,55 @@
 license: apache-2.0
 ---
-GGUF conversion of https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct

 license: apache-2.0
 ---
+GGUF conversion of
+# gte-Qwen2-1.5B-instruct GGUF
+GGUF conversion of [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct)
+Avaiable formats:
+- Q2_K.gguf
+- Q3_K.gguf
+- Q4_K.gguf
+- Q5_K.gguf
+- Q6_K.gguf
+- Q8_0.gguf
+- F16.gguf
+- BF16.gguf
+## Usage
+Requires: [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
+```python
+from functools import partial
+import numpy as np
+from llama_cpp import Llama
+max_length = 512
+model = Llama.from_pretrained(
+    repo_id="mm/gte-Qwen2-1.5B-instruct-gguf",
+    filename="*Q4_K.gguf",  # Choose from the avaiable formats,
+    embedding=True,
+    n_ctx=max_length,
+    n_batch=max_length,
+    verbose=False,
+)
+model.tokenize = partial(model.tokenize, special=True)
+def calc_emb(s: str):
+    if len(model.tokenize(s.encode())) > max_length - 1:
+        print(
+            "The output will be calculated with truncation because of the length exceeding."
+        )
+    v = model.embed(s, normalize=True, truncate=True)
+    return np.asarray(v[-1])
+s = "今日の天気は？"
+t = "本日の天候は？"
+print(f"cossim({s}, {t}) = {(calc_emb(s) * calc_emb(t)).sum()}")
+```