mm
/

gte-Qwen2-1.5B-instruct-gguf

Inference Endpoints

Model card Files Files and versions Community

gte-Qwen2-1.5B-instruct-gguf / README.md

mm's picture

mm

Update README.md

1bba62b verified 4 months ago

|

1.2 kB

	---
	license: apache-2.0
	---

	# gte-Qwen2-1.5B-instruct GGUF
	GGUF conversion of [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct)

	Avaiable formats:
	- Q2_K.gguf
	- Q3_K.gguf
	- Q4_K.gguf
	- Q5_K.gguf
	- Q6_K.gguf
	- Q8_0.gguf
	- F16.gguf
	- BF16.gguf

	## Usage

	Requires: [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)

	```python
	from functools import partial

	import numpy as np
	from llama_cpp import Llama

	max_length = 512

	model = Llama.from_pretrained(
	repo_id="mm/gte-Qwen2-1.5B-instruct-gguf",
	filename="*Q4_K.gguf", # Choose from the avaiable formats,
	embedding=True,
	n_ctx=max_length,
	n_batch=max_length,
	flash_attn=True,
	verbose=False,
	)
	model.tokenize = partial(model.tokenize, special=True)


	def calc_emb(s: str):
	if len(model.tokenize(s.encode())) > max_length - 1:
	print(
	"The output will be calculated with truncation because of the length exceeding."
	)
	v = model.embed(s, normalize=True, truncate=True)
	return np.asarray(v[-1])


	s = "今日の天気は？"
	t = "本日の天候は？"

	print(f"cossim({s}, {t}) = {(calc_emb(s) * calc_emb(t)).sum()}")
	```