--- license: apache-2.0 --- GGUF conversion of # gte-Qwen2-1.5B-instruct GGUF GGUF conversion of [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) Avaiable formats: - Q2_K.gguf - Q3_K.gguf - Q4_K.gguf - Q5_K.gguf - Q6_K.gguf - Q8_0.gguf - F16.gguf - BF16.gguf ## Usage Requires: [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) ```python from functools import partial import numpy as np from llama_cpp import Llama max_length = 512 model = Llama.from_pretrained( repo_id="mm/gte-Qwen2-1.5B-instruct-gguf", filename="*Q4_K.gguf", # Choose from the avaiable formats, embedding=True, n_ctx=max_length, n_batch=max_length, verbose=False, ) model.tokenize = partial(model.tokenize, special=True) def calc_emb(s: str): if len(model.tokenize(s.encode())) > max_length - 1: print( "The output will be calculated with truncation because of the length exceeding." ) v = model.embed(s, normalize=True, truncate=True) return np.asarray(v[-1]) s = "今日の天気は?" t = "本日の天候は?" print(f"cossim({s}, {t}) = {(calc_emb(s) * calc_emb(t)).sum()}") ```