jinaai
/

jina-embeddings-v3

@@ -66,7 +66,7 @@ language:
   - my
   - ne
   - nl
-  - 'no'
   - om
   - or
   - pa
@@ -201,37 +201,56 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
 </p>
 </details>
-1. The easiest way to starting using jina-clip-v1-en is to use Jina AI's [Embeddings API](https://jina.ai/embeddings/).
-2. Alternatively, you can use Jina CLIP directly via transformers package.
 ```python
-!pip install transformers einops flash_attn
 from transformers import AutoModel
 # Initialize the model
 model = AutoModel.from_pretrained('jinaai/jina-embeddings-v3', trust_remote_code=True)
-# New meaningful sentences
-sentences = [
-    "Organic skincare for sensitive skin with aloe vera and chamomile.",
-    "New makeup trends focus on bold colors and innovative techniques",
-    "Bio-Hautpflege für empfindliche Haut mit Aloe Vera und Kamille",
-    "Neue Make-up-Trends setzen auf kräftige Farben und innovative Techniken",
-    "Cuidado de la piel orgánico para piel sensible con aloe vera y manzanilla",
-    "Las nuevas tendencias de maquillaje se centran en colores vivos y técnicas innovadoras",
-    "针对敏感肌专门设计的天然有机护肤产品",
-    "新的化妆趋势注重鲜艳的颜色和创新的技巧",
-    "敏感肌のために特別に設計された天然有機スキンケア製品",
-    "新しいメイクのトレンドは鮮やかな色と革新的な技術に焦点を当てています",
 ]
-# Encode sentences
-embeddings = model.encode(sentences, truncate_dim=1024, task_type='index') # TODO UPDATE
 # Compute similarities
 print(embeddings[0] @ embeddings[1].T)
 ```
 ## Performance

   - my
   - ne
   - nl
+  - no
   - om
   - or
   - pa
 </p>
 </details>
+1. The easiest way to starting using `jina-embeddings-v3` is to use Jina AI's [Embeddings API](https://jina.ai/embeddings/).
+2. Alternatively, you can use `jina-embeddings-v3` directly via transformers package.
 ```python
+!pip install transformers
 from transformers import AutoModel
 # Initialize the model
 model = AutoModel.from_pretrained('jinaai/jina-embeddings-v3', trust_remote_code=True)
+texts = [
+    'Follow the white rabbit.',              # English
+    'Sigue al conejo blanco.',               # Spanish
+    'Suis le lapin blanc.',                  # French
+    '跟着白兔走。',                            # Chinese
+    'اتبع الأرنب الأبيض.',                     # Arabic
+    'Folge dem weißen Kaninchen.'            # German
 ]
+# When calling the `encode` function, you can choose a task_type based on the use case:
+# 'retrieval.query', 'retrieval.passage', 'separation', 'classification', 'text-matching'
+# Alternatively, you can choose not to pass a task_type, and no specific LoRA adapter will be used.
+embeddings = model.encode(texts, task_type='text-matching')
 # Compute similarities
 print(embeddings[0] @ embeddings[1].T)
 ```
+By default, the model supports a maximum sequence length of 8192 tokens.
+However, if you want to truncate your input texts to a shorter length, you can pass the `max_length` parameter to the encode function:
+```python
+embeddings = model.encode(
+    ['Very long ... document'],
+    max_length=2048
+)
+```
+In case you want to use Matryoshka embeddings and switch to a different embedding dimension,
+you can adjust the embedding dimension by passing the `truncate_dim` parameter to the encode function:
+```python
+embeddings = model.encode(
+    ['Sample text'],
+    truncate_dim=256
+)
+```
 ## Performance