File size: 1,199 Bytes
df21559
 
 
 
0e064ec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: apache-2.0
---

GGUF conversion of 

# gte-Qwen2-1.5B-instruct GGUF
GGUF conversion of [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct)

Avaiable formats:
- Q2_K.gguf
- Q3_K.gguf
- Q4_K.gguf
- Q5_K.gguf
- Q6_K.gguf
- Q8_0.gguf
- F16.gguf
- BF16.gguf

## Usage

Requires: [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)

```python
from functools import partial

import numpy as np
from llama_cpp import Llama

max_length = 512

model = Llama.from_pretrained(
    repo_id="mm/gte-Qwen2-1.5B-instruct-gguf",
    filename="*Q4_K.gguf",  # Choose from the avaiable formats,
    embedding=True,
    n_ctx=max_length,
    n_batch=max_length,
    verbose=False,
)
model.tokenize = partial(model.tokenize, special=True)


def calc_emb(s: str):
    if len(model.tokenize(s.encode())) > max_length - 1:
        print(
            "The output will be calculated with truncation because of the length exceeding."
        )
    v = model.embed(s, normalize=True, truncate=True)
    return np.asarray(v[-1])


s = "今日の天気は?"
t = "本日の天候は?"

print(f"cossim({s}, {t}) = {(calc_emb(s) * calc_emb(t)).sum()}")
```