File size: 1,199 Bytes
df21559 0e064ec 1bba62b 0e064ec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---
license: apache-2.0
---
# gte-Qwen2-1.5B-instruct GGUF
GGUF conversion of [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct)
Avaiable formats:
- Q2_K.gguf
- Q3_K.gguf
- Q4_K.gguf
- Q5_K.gguf
- Q6_K.gguf
- Q8_0.gguf
- F16.gguf
- BF16.gguf
## Usage
Requires: [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
```python
from functools import partial
import numpy as np
from llama_cpp import Llama
max_length = 512
model = Llama.from_pretrained(
repo_id="mm/gte-Qwen2-1.5B-instruct-gguf",
filename="*Q4_K.gguf", # Choose from the avaiable formats,
embedding=True,
n_ctx=max_length,
n_batch=max_length,
flash_attn=True,
verbose=False,
)
model.tokenize = partial(model.tokenize, special=True)
def calc_emb(s: str):
if len(model.tokenize(s.encode())) > max_length - 1:
print(
"The output will be calculated with truncation because of the length exceeding."
)
v = model.embed(s, normalize=True, truncate=True)
return np.asarray(v[-1])
s = "今日の天気は?"
t = "本日の天候は?"
print(f"cossim({s}, {t}) = {(calc_emb(s) * calc_emb(t)).sum()}")
```
|