|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# gte-Qwen2-1.5B-instruct GGUF |
|
GGUF conversion of [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) |
|
|
|
Avaiable formats: |
|
- Q2_K.gguf |
|
- Q3_K.gguf |
|
- Q4_K.gguf |
|
- Q5_K.gguf |
|
- Q6_K.gguf |
|
- Q8_0.gguf |
|
- F16.gguf |
|
- BF16.gguf |
|
|
|
## Usage |
|
|
|
Requires: [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) |
|
|
|
```python |
|
from functools import partial |
|
|
|
import numpy as np |
|
from llama_cpp import Llama |
|
|
|
max_length = 512 |
|
|
|
model = Llama.from_pretrained( |
|
repo_id="mm/gte-Qwen2-1.5B-instruct-gguf", |
|
filename="*Q4_K.gguf", # Choose from the avaiable formats, |
|
embedding=True, |
|
n_ctx=max_length, |
|
n_batch=max_length, |
|
flash_attn=True, |
|
verbose=False, |
|
) |
|
model.tokenize = partial(model.tokenize, special=True) |
|
|
|
|
|
def calc_emb(s: str): |
|
if len(model.tokenize(s.encode())) > max_length - 1: |
|
print( |
|
"The output will be calculated with truncation because of the length exceeding." |
|
) |
|
v = model.embed(s, normalize=True, truncate=True) |
|
return np.asarray(v[-1]) |
|
|
|
|
|
s = "今日の天気は?" |
|
t = "本日の天候は?" |
|
|
|
print(f"cossim({s}, {t}) = {(calc_emb(s) * calc_emb(t)).sum()}") |
|
``` |
|
|