Edit model card

SentenceTransformer based on colbert-ir/colbertv2.0

This is a sentence-transformers model finetuned from colbert-ir/colbertv2.0 on the query-to-dataset-viewer-descriptions dataset. It maps sentences & paragraphs to a 128-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

ColBERT(
  (0): Transformer({'max_seq_length': 179, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("contrastive-bert-base-uncased")
# Run inference
sentences = [
    'USER_QUERY: arabic text classification dataset',
    'HUB_DATASET_PREVIEW: DATASET_NAME: "FreedomIntelligence/Arabic-Vicuna-80"\nFEATURES: {\'id\': {\'dtype\': \'int64\', \'_type\': \'Value\'}, \'label\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'query\': {\'dtype\': \'string\', \'_type\': \'Value\'}}\nDATA SAMPLE:\n[\n  {\n    "row_idx": 0,\n    "row": {\n      "id": 1,\n      "label": "generic",\n      "query": "\\u0643\\u064a\\u0641 \\u064a\\u0645\\u0643\\u0646\\u0646\\u064a \\u062a\\u062d\\u0633\\u064a\\u0646 \\u0645\\u0647\\u0627\\u0631\\u0627\\u062a \\u0625\\u062f\\u0627\\u0631\\u0629 \\u0627\\u0644\\u0648\\u0642\\u062a \\u0627\\u0644\\u062e\\u0627\\u0635\\u0629 \\u0628\\u064a\\u061f"\n    },\n    "truncated_cells": []\n  },\n  {\n    "row_idx": 1,\n    "row": {\n      "id": 2,\n      "label": "generic",\n      "query": "\\u0645\\u0627 \\u0647\\u064a \\u0627\\u0644\\u0637\\u0631\\u064a\\u0642\\u0629 \\u0627\\u0644\\u0623\\u0643\\u062b\\u0631 \\u0641\\u0639\\u0627\\u0644\\u064a\\u0629 \\u0644\\u0644\\u062a\\u0639\\u0627\\u0645\\u0644 \\u0645\\u0639 \\u0627\\u0644\\u0636\\u063a\\u0637\\u061f"\n    },\n    "truncated_cells": []\n  }\n]',
    'NEGATIVE: DATASET_NAME: "arbml/CIDAR-MCQ-100"\nFEATURES: {\'Question\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'A\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'B\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'C\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'D\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'answer\': {\'dtype\': \'string\', \'_type\': \'Value\'}}\nDATA SAMPLE:\n[\n  {\n    "row_idx": 0,\n    "row": {\n      "Question": "\\u062d\\u062f\\u062f \\u062d\\u064a\\u0648\\u0627\\u0646 \\u0645\\u0634\\u0647\\u0648\\u0631 \\u0641\\u064a \\u0627\\u0644\\u0645\\u0646\\u0637\\u0642\\u0629",\n      "A": "\\u0627\\u0644\\u062c\\u0645\\u0644",\n      "B": "\\u0627\\u0644\\u0644\\u0627\\u0645\\u0627",\n      "C": "\\u0627\\u0644\\u0643\\u0627\\u0646\\u063a\\u0631\\u0648",\n      "D": "\\u0627\\u0644\\u062f\\u0628 \\u0627\\u0644\\u0642\\u0637\\u0628\\u064a",\n      "answer": "A"\n    },\n    "truncated_cells": []\n  },\n  {\n    "row_idx": 1,\n    "row": {\n      "Question": "\\u0636\\u0639 \\u0643\\u0644\\u0645\\u0629 \\u0645\\u0646\\u0627\\u0633\\u0628\\u0629 \\u0644\\u0625\\u0643\\u0645\\u0627\\u0644 \\u0627\\u0644\\u062c\\u0645\\u0644\\u0629: \\u0648\\u0642\\u0641\\u062a \\u0623\\u0646\\u0638\\u0631 \\u0625\\u0644\\u064a \\u0627\\u0644\\u0633\\u0645\\u0627\\u0621 \\u0641\\u0631\\u0623\\u064a\\u062a _",\n      "A": "\\u0637\\u0627\\u0626\\u0631 \\u0627\\u0644\\u0637\\u0646\\u0627\\u0646",\n      "B": "\\u0627\\u0644\\u0641\\u0644\\u0627\\u0645\\u0646\\u062c\\u0648",\n      "C": "\\u0627\\u0644\\u0628\\u0644\\u0634\\u0648\\u0646",\n      "D": "\\u0627\\u0644\\u062d\\u0645\\u0627\\u0645",\n      "answer": "D"\n    },\n    "truncated_cells": []\n  }\n]',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 128]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

query-to-dataset-viewer-descriptions

  • Dataset: query-to-dataset-viewer-descriptions at eb9d1be
  • Size: 1,433 training samples
  • Columns: query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    query positive negative
    type string string string
    details
    • min: 10 tokens
    • mean: 12.72 tokens
    • max: 20 tokens
    • min: 32 tokens
    • mean: 32.0 tokens
    • max: 32 tokens
    • min: 32 tokens
    • mean: 32.0 tokens
    • max: 32 tokens
  • Samples:
    query positive negative
    USER_QUERY: Persian product entity recognition dataset HUB_DATASET_PREVIEW: DATASET_NAME: "BaSalam/entity-attribute-dataset-GPT-3.5-generated-v1"
    FEATURES: {'instruction': {'dtype': 'string', '_type': 'Value'}, 'output': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "instruction": "here is a product title from a Iranian marketplace. \n give me the Product Entity and Attributes of this product in Persian language.\n give the output in this json format: {'attributes': {'attribute_name' : , ...}, 'product_entity': ''}.\n Don't make assumptions about what values to plug into json. Just give Json not a single word more.\n \nproduct title: \u062e\u0631\u0633 \u0622\u0628\u0631\u0646\u06af\u06cc 150 \u0633\u0627\u0646\u062a \u0627\u0648\u0631\u062c\u06cc\u0646\u0627\u0644 \u0636\u062f\u062d\u0633\u0627\u0633\u06cc\u062a \u062f\u0631\u062c\u0647 \u06cc\u06a9 ",
    "output": "{"attributes": {"\u0627\u0646\u062f\u0627\u0632\u0647": "150 \u0633\u0627\u0646\u062a\u06cc\u0645\u062a\u0631", "\u0646\u0648\u0639": "\u062e\u0631\u0633 \u0622\u0628\u0631\u0646\u06af\u06cc", "\u0648\u06cc\u0698\u06af\u06cc": "\u0636\u062f\u062d\u0633\u0627\u0633\u06cc\u062a \u062f\u0631\u062c\u0647 \u06cc\u06a9"}, "product_entity": "\u0627\u0633\u0628\u0627\u0628 \u0628\u0627\u0632\u06cc"}"
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "instruction": "here is a product title from a Iranian marketplace. \n give me the Product Entity and Attributes of this product in Persian language.\n give the output in this json format: {'attributes': {'attribute_name' : , ...}, 'product_entity': ''}.\n Don't make assumptions about what values to plug into json. Just give Json not a single word more.\n \nproduct title: \u0634\u0645\u0639 \u062e\u0631\u06af\u0648\u0634 \u06af\u0648\u0634 \u062e\u0648\u0627\u0628\u06cc\u062f\u0647 \u0628\u0633\u06cc\u0627\u0631\u0632\u06cc\u0628\u0627 \u0645\u0646\u0627\u0633\u0628 \u0633\u0641\u0631\u0647 \u0647\u0641\u062a \u0633\u06cc\u0646\n\u0634\u0645\u0639 \u062e\u0631\u06af\u0648\u0634\u060c \u06af\u0648\u0634 \u062e\u0648\u0627\u0628\u06cc\u062f\u0647\u060c \u062a\u06a9\u06cc\u060c \u0627\u06a9\u0644\u06cc\u0644 \u062e\u0648\u0631\u062f\u0647 \u0648 \u062f\u0627\u062e\u0644 \u0637\u0644\u0642 \u062a\u0642\u062f\u06cc\u0645 \u0645\u06cc \u0634\u0648\u062f \u0645\u0646\u0627\u0633\u0628 \u0633\u0641\u0631\u0647 \u0647\u0641\u062a \u0633\u06cc\u0646",
    "output": "{"attributes": {"\u0648\u06cc\u0698\u06af\u06cc": "\u06af\u0648\u0634 \u062e\u0648\u0627\u0628\u06cc\u062f\u0647", "\u062a\u0639\u062f\u0627\u062f": "\u062a\u06a9\u06cc", "\u0645\u0646\u0627\u0633\u0628\u06cc\u062a": "\u0633\u0641\u0631\u0647 \u0647\u0641\u062a \u0633\u06cc\u0646"}, "product_entity": "\u0634\u0645\u0639 \u062e\u0631\u06af\u0648\u0634 \u06af\u0648\u0634 \u062e\u0648\u0627\u0628\u06cc\u062f\u0647"}"
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "AliAsh/digikala_translated_small_5m"
    FEATURES: {'id': {'dtype': 'large_string', '_type': 'Value'}, 'title_fa': {'dtype': 'large_string', '_type': 'Value'}, 'title_en': {'dtype': 'large_string', '_type': 'Value'}, 'description_fa': {'dtype': 'large_string', '_type': 'Value'}, 'review_fa': {'dtype': 'large_string', '_type': 'Value'}, 'brand_fa': {'dtype': 'large_string', '_type': 'Value'}, 'brand_en': {'dtype': 'large_string', '_type': 'Value'}, 'category_fa': {'dtype': 'large_string', '_type': 'Value'}, 'category_en': {'dtype': 'large_string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "id": "9632866",
    "title_fa": "\u06af\u0631\u062f\u0646\u0628\u0646\u062f \u062e\u0646\u062f\u0627\u0644\u0648 \u0645\u062f\u0644 \u0641\u0644\u06cc\u06a9\u0633 \u0648 \u0644\u06cc \u0646\u0648 \u06af\u0631\u0648\u0647 \u0627\u0633\u062a\u0631\u06cc \u06a9\u06cc\u062f\u0632 Stary Kids \u06a9\u062f 1686716871",
    "title_en": "Khandalo necklace, Felix and Lee new model, Stary Kids group, code 1686716871",
    "description_fa": "\u06af\u0631\u062f\u0646\u0628\u0646\u062f \u062e\u0646\u062f\u0627\u0644\u0648\u060c \u0645\u062d\u0635\u0648\u0644\u06cc \u062c\u062f\u06cc\u062f \u0648 \u062c\u0630\u0627\u0628 \u0627\u0633\u062a \u06a9\u0647 \u0628\u0635\u0648\u0631\u062a \u062f\u0648 \u0637\u0631\u0641\u0647 \u0628\u0648\u062f\u0647\u060c \u0648 \u062f\u0631 \u0647\u0631 \u06cc\u06a9 \u0627\u0632 \u0637\u0631\u0641\u06cc\u0646 \u06af\u0631\u062f\u0646\u0628\u0646\u062f\u060c \u0637\u0631\u062d \u0645\u062a\u0641\u0627\u0648\u062a\u06cc \u0627\u0632 \u06cc\u06a9 \u0645\u0648\u0636\u0648\u0639 \u0686\u0627\u067e \u0634\u062f\u0647 \u0627\u0633\u062a.\r\n\u062f\u0648\u0631 \u067e\u0644\u0627\u06a9 \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0631\u0627\u060c \u0631\u06cc\u0646\u06af\u06cc \u0627\u0632 \u062c\u0646\u0633 \u06a9\u0631\u0648\u0645 \u0646\u0642\u0631\u0647 \u0627\u06cc \u067e\u0648\u0634\u0627\u0646\u062f\u0647 \u06a9\u0647 \u0631\u0646\u06af \u0622\u0646 \u062b\u0627\u0628\u062a \u0628\u0648\u062f\u0647 \u0648 \u0628\u0627\u0639\u062b \u0647\u0631\u0686\u0647 \u0632\u06cc\u0628\u0627\u062a\u0631 \u0634\u062f\u0646 \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0645\u06cc \u0634\u0648\u062f.\r\n\u0631\u0648\u06a9\u0634 \u0628\u0631\u0627\u0642 \u0631\u0648\u06cc \u06af\u0631\u062f\u0646\u0628\u0646\u062f\u060c \u0645\u0642\u0627\u0648\u0645\u062a \u0628\u0627\u0644\u0627\u06cc\u06cc \u062f\u0627\u0631\u062f \u0648 \u062f\u0631 \u0645\u0642\u0627\u0628\u0644 \u0639\u0631\u0642 \u0628\u062f\u0646 \u0628\u0627 \u062f\u0648\u0627\u0645 \u0628\u0648\u062f\u0647 \u0648 \u0628\u0627 \u0631\u0637\u0648\u0628\u062a \u0647\u0627\u06cc \u06a9\u0645\u060c \u0628\u0647 \u0637\u0631\u062d \u0622\u0633\u06cc\u0628\u06cc \u0648\u0627\u0631\u062f \u0646\u0645\u06cc \u0634\u0648\u062f.\r\n\u0686\u0627\u067e \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0647\u0627 \u0628\u0648\u0633\u06cc\u0644\u0647 \u0628\u0647\u062a\u0631\u06cc\u0646 \u062f\u0633\u062a\u06af\u0627\u0647 \u0647\u0627\u06cc \u0686\u0627\u067e \u062f\u06cc\u062c\u06cc\u062a\u0627\u0644 \u0635\u0648\u0631\u062a \u0645\u06cc \u06af\u06cc\u0631\u062f \u0648 \u0637\u0631\u062d \u0628\u0627 \u0634\u0641\u0627\u0641\u06cc\u062a\u06cc \u0645\u062b\u0627\u0644 \u0632\u062f\u0646\u06cc \u0628\u0631 \u0631\u0648\u06cc \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0646\u0642\u0634 \u0645\u06cc \u0628\u0646\u062f\u062f.\r\n\u062c\u0646\u0633 \u0628\u0646\u062f \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0647\u0627 \u0646\u06cc\u0632 \u062a\u0648\u0644\u06cc\u062f \u0634\u062f\u0647 \u0627\u0632 \u0686\u0631\u0645 \u0635\u0646\u0639\u062a\u06cc \u0633\u062a \u0648 \u0628\u0627 \u0636\u062e\u0627\u0645\u062a \u0648 \u0645\u0642\u0627\u0648\u0645\u062a \u0628\u0627\u0644\u0627\u06cc\u06cc \u06a9\u0647 \u062f\u0627\u0631\u062f \u0622\u0633\u06cc\u0628\u06cc \u0646\u0645\u06cc \u0628\u06cc\u0646\u062f. \u0632\u0646\u062c\u06cc\u0631 \u0628\u0648\u0633\u06cc\u0644\u0647 \u0642\u0641\u0644 \u0622\u0628\u06a9\u0627\u0631\u06cc \u0634\u062f\u0647 \u0628\u0647 \u0631\u0646\u06af \u0646\u0642\u0631\u0647 \u0627\u06cc\u060c \u0642\u0627\u0628\u0644 \u0628\u0627\u0632 \u0648 \u0628\u0633\u062a\u0647 \u0634\u062f\u0646 \u0627\u0633\u062a \u06a9\u0647 \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u0627\u0632 \u0622\u0646 \u0631\u0627 \u0631\u0627\u062d\u062a \u062a\u0631 \u0645\u06cc \u06a9\u0646\u062f.\r\n\u0647\u0645\u0686\u0646\u06cc\u0646 \u0628\u0633\u062a\u0647 \u0628\u0646\u062f\u06cc \u0645\u06cc\u0646\u06cc\u0645\u0627\u0644 \u0648 \u0645\u0646\u062d\u0635\u0631 \u0628\u0647 \u0641\u0631\u062f \u062e\u0646\u062f\u0627\u0644\u0648\u060c \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0647\u0627 \u0631\u0627 \u062a\u0628\u062f\u06cc\u0644 \u0628\u0647 \u0647\u062f\u06cc\u0647 \u0627\u06cc \u062c\u0630\u0627\u0628 \u0645\u06cc \u06a9\u0646\u062f \u06a9\u0647 \u0628\u0627 \u0642\u06cc\u0645\u062a \u0645\u0646\u0627\u0633\u0628 \u0648 \u062a\u0646\u0648\u0639 \u0637\u0631\u062d\u06cc \u0628\u0633\u06cc\u0627\u0631 \u0628\u0627\u0644\u0627\u06cc\u06cc \u06a9\u0647 \u0627\u0632 \u0647\u0645\u0647 \u0645\u0648\u0636\u0648\u0639\u0627\u062a \u062f\u0627\u0631\u062f\u060c \u0645\u06cc\u062a\u0648\u0627\u0646\u062f \u0628\u0631\u0627\u06cc \u0647\u0631 \u0633\u0644\u06cc\u0642\u0647 \u0627\u06cc \u062e\u0648\u0634\u062d\u0627\u0644 \u06a9\u0646\u0646\u062f\u0647 \u0628\u0627\u0634\u062f.",
    "review_fa": "\u06af\u0631\u062f\u0646\u0628\u0646\u062f \u062e\u0646\u062f\u0627\u0644\u0648 \u0645\u062f\u0644 \u0641\u0644\u06cc\u06a9\u0633 \u0648 \u0644\u06cc \u0646\u0648 \u06af\u0631\u0648\u0647 \u0627\u0633\u062a\u0631\u06cc \u06a9\u06cc\u062f\u0632 Stary Kids \u06a9\u062f 1686716871",
    "brand_fa": "\u062e\u0646\u062f\u0627\u0644\u0648",
    "brand_en": "Khandaloo",
    "category_fa": "\u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0632\u0646\u0627\u0646\u0647 \u0648 \u0645\u0631\u062f\u0627\u0646\u0647",
    "category_en": "Women And Men Necklace"
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "id": "9632865",
    "title_fa": "\u06af\u0631\u062f\u0646\u0628\u0646\u062f \u062e\u0646\u062f\u0627\u0644\u0648 \u0645\u062f\u0644 \u0622\u06cc \u0627\u0646 \u06af\u0631\u0648\u0647 \u0627\u0633\u062a\u0631\u06cc \u06a9\u06cc\u062f\u0632 Stary Kids \u06a9\u062f 1686616876",
    "title_en": "Khandalo necklace, IN model, Stary Kids group, code 1686616876",
    "description_fa": "\u06af\u0631\u062f\u0646\u0628\u0646\u062f \u062e\u0646\u062f\u0627\u0644\u0648\u060c \u0645\u062d\u0635\u0648\u0644\u06cc \u062c\u062f\u06cc\u062f \u0648 \u062c\u0630\u0627\u0628 \u0627\u0633\u062a \u06a9\u0647 \u0628\u0635\u0648\u0631\u062a \u062f\u0648 \u0637\u0631\u0641\u0647 \u0628\u0648\u062f\u0647\u060c \u0648 \u062f\u0631 \u0647\u0631 \u06cc\u06a9 \u0627\u0632 \u0637\u0631\u0641\u06cc\u0646 \u06af\u0631\u062f\u0646\u0628\u0646\u062f\u060c \u0637\u0631\u062d \u0645\u062a\u0641\u0627\u0648\u062a\u06cc \u0627\u0632 \u06cc\u06a9 \u0645\u0648\u0636\u0648\u0639 \u0686\u0627\u067e \u0634\u062f\u0647 \u0627\u0633\u062a.\r\n\u062f\u0648\u0631 \u067e\u0644\u0627\u06a9 \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0631\u0627\u060c \u0631\u06cc\u0646\u06af\u06cc \u0627\u0632 \u062c\u0646\u0633 \u06a9\u0631\u0648\u0645 \u0646\u0642\u0631\u0647 \u0627\u06cc \u067e\u0648\u0634\u0627\u0646\u062f\u0647 \u06a9\u0647 \u0631\u0646\u06af \u0622\u0646 \u062b\u0627\u0628\u062a \u0628\u0648\u062f\u0647 \u0648 \u0628\u0627\u0639\u062b \u0647\u0631\u0686\u0647 \u0632\u06cc\u0628\u0627\u062a\u0631 \u0634\u062f\u0646 \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0645\u06cc \u0634\u0648\u062f.\r\n\u0631\u0648\u06a9\u0634 \u0628\u0631\u0627\u0642 \u0631\u0648\u06cc \u06af\u0631\u062f\u0646\u0628\u0646\u062f\u060c \u0645\u0642\u0627\u0648\u0645\u062a \u0628\u0627\u0644\u0627\u06cc\u06cc \u062f\u0627\u0631\u062f \u0648 \u062f\u0631 \u0645\u0642\u0627\u0628\u0644 \u0639\u0631\u0642 \u0628\u062f\u0646 \u0628\u0627 \u062f\u0648\u0627\u0645 \u0628\u0648\u062f\u0647 \u0648 \u0628\u0627 \u0631\u0637\u0648\u0628\u062a \u0647\u0627\u06cc \u06a9\u0645\u060c \u0628\u0647 \u0637\u0631\u062d \u0622\u0633\u06cc\u0628\u06cc \u0648\u0627\u0631\u062f \u0646\u0645\u06cc \u0634\u0648\u062f.\r\n\u0686\u0627\u067e \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0647\u0627 \u0628\u0648\u0633\u06cc\u0644\u0647 \u0628\u0647\u062a\u0631\u06cc\u0646 \u062f\u0633\u062a\u06af\u0627\u0647 \u0647\u0627\u06cc \u0686\u0627\u067e \u062f\u06cc\u062c\u06cc\u062a\u0627\u0644 \u0635\u0648\u0631\u062a \u0645\u06cc \u06af\u06cc\u0631\u062f \u0648 \u0637\u0631\u062d \u0628\u0627 \u0634\u0641\u0627\u0641\u06cc\u062a\u06cc \u0645\u062b\u0627\u0644 \u0632\u062f\u0646\u06cc \u0628\u0631 \u0631\u0648\u06cc \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0646\u0642\u0634 \u0645\u06cc \u0628\u0646\u062f\u062f.\r\n\u062c\u0646\u0633 \u0628\u0646\u062f \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0647\u0627 \u0646\u06cc\u0632 \u062a\u0648\u0644\u06cc\u062f \u0634\u062f\u0647 \u0627\u0632 \u0686\u0631\u0645 \u0635\u0646\u0639\u062a\u06cc \u0633\u062a \u0648 \u0628\u0627 \u0636\u062e\u0627\u0645\u062a \u0648 \u0645\u0642\u0627\u0648\u0645\u062a \u0628\u0627\u0644\u0627\u06cc\u06cc \u06a9\u0647 \u062f\u0627\u0631\u062f \u0622\u0633\u06cc\u0628\u06cc \u0646\u0645\u06cc \u0628\u06cc\u0646\u062f. \u0632\u0646\u062c\u06cc\u0631 \u0628\u0648\u0633\u06cc\u0644\u0647 \u0642\u0641\u0644 \u0622\u0628\u06a9\u0627\u0631\u06cc \u0634\u062f\u0647 \u0628\u0647 \u0631\u0646\u06af \u0646\u0642\u0631\u0647 \u0627\u06cc\u060c \u0642\u0627\u0628\u0644 \u0628\u0627\u0632 \u0648 \u0628\u0633\u062a\u0647 \u0634\u062f\u0646 \u0627\u0633\u062a \u06a9\u0647 \u0627\u0633\u062a\u0641\u0627\u062f\u0647 \u0627\u0632 \u0622\u0646 \u0631\u0627 \u0631\u0627\u062d\u062a \u062a\u0631 \u0645\u06cc \u06a9\u0646\u062f.\r\n\u0647\u0645\u0686\u0646\u06cc\u0646 \u0628\u0633\u062a\u0647 \u0628\u0646\u062f\u06cc \u0645\u06cc\u0646\u06cc\u0645\u0627\u0644 \u0648 \u0645\u0646\u062d\u0635\u0631 \u0628\u0647 \u0641\u0631\u062f \u062e\u0646\u062f\u0627\u0644\u0648\u060c \u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0647\u0627 \u0631\u0627 \u062a\u0628\u062f\u06cc\u0644 \u0628\u0647 \u0647\u062f\u06cc\u0647 \u0627\u06cc \u062c\u0630\u0627\u0628 \u0645\u06cc \u06a9\u0646\u062f \u06a9\u0647 \u0628\u0627 \u0642\u06cc\u0645\u062a \u0645\u0646\u0627\u0633\u0628 \u0648 \u062a\u0646\u0648\u0639 \u0637\u0631\u062d\u06cc \u0628\u0633\u06cc\u0627\u0631 \u0628\u0627\u0644\u0627\u06cc\u06cc \u06a9\u0647 \u0627\u0632 \u0647\u0645\u0647 \u0645\u0648\u0636\u0648\u0639\u0627\u062a \u062f\u0627\u0631\u062f\u060c \u0645\u06cc\u062a\u0648\u0627\u0646\u062f \u0628\u0631\u0627\u06cc \u0647\u0631 \u0633\u0644\u06cc\u0642\u0647 \u0627\u06cc \u062e\u0648\u0634\u062d\u0627\u0644 \u06a9\u0646\u0646\u062f\u0647 \u0628\u0627\u0634\u062f.",
    "review_fa": "\u06af\u0631\u062f\u0646\u0628\u0646\u062f \u062e\u0646\u062f\u0627\u0644\u0648 \u0645\u062f\u0644 \u0622\u06cc \u0627\u0646 \u06af\u0631\u0648\u0647 \u0627\u0633\u062a\u0631\u06cc \u06a9\u06cc\u062f\u0632 Stary Kids \u06a9\u062f 1686616876",
    "brand_fa": "\u062e\u0646\u062f\u0627\u0644\u0648",
    "brand_en": "Khandaloo",
    "category_fa": "\u06af\u0631\u062f\u0646\u0628\u0646\u062f \u0632\u0646\u0627\u0646\u0647 \u0648 \u0645\u0631\u062f\u0627\u0646\u0647",
    "category_en": "Women And Men Necklace"
    },
    "truncated_cells": []
    }
    ]
    USER_QUERY: visual question answering dataset HUB_DATASET_PREVIEW: DATASET_NAME: "Mantis-VL/MIQA_sample"
    FEATURES: {'id': {'dtype': 'string', '_type': 'Value'}, 'images': {'feature': {'decode': False, '_type': 'Image'}, 'type': 'Sequence'}, 'conversation': [{'role': {'dtype': 'string', 'type': 'Value'}, 'content': {'dtype': 'string', 'type': 'Value'}}], 'source': {'dtype': 'string', 'type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "id": "ai2d_3779.png",
    "images": [
    {
    "src": "https://datasets-server.huggingface.co/assets/Mantis-VL/MIQA_sample/--/b5a021b82cf9ddf48745cdfddffbd4bb8a8861fc/--/ai2d/train/0/images/image-1d100e9.jpg?Expires=1726591696&Signature=zhTEV0fPVouMHjzmVF83gLntTuzqwm0PQcwmb7w2kScDw5xmy~3o5cs4796Y6W~h5Z6eyXvVvPIK6McobkNfexb3edE8Icxp9nmmprlgAV1ZoEgP8FaxPN6xFNSAeaTxML4Wov~2tgORPY1RbSP7x-LWawDqWHT6rpmHkUZkAATgU0oskmp2cGGsCqOxlBn02WQmkLhEJZDCICMT7deOgKtw1ZER3o7uDiwIPMexAj7f7CaUb6ys1n9afFcLrTY01~xSnod8hpuXu0xcJ6WwM5Uwugpne3ncNA~If7gsEhb75nkPn1VE5sMIPBUWCh2-lZ0UO14veLf5UWPY-dky0Q
    &Key-Pair-Id=K3EI6M078Z3AC3",
    "height": 297,
    "width": 248
    }
    ],
    "conversation": [
    {
    "role": "user",
    "content": "Answer the following multiple choice question based on the given image.\nWhich part comes between the B and I in the above diagram?\n(A) E\n(B) C\n(C) H\n(D) F\n"
    },
    {
    "role": "assistant",
    "content": "B"
    }
    ],
    "source": "ai2d"
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "id": "ai2d_3779.png",
    "images": [
    {
    "src": "https://datasets-server.huggingface.co/assets/Mantis-VL/MIQA_sample/--/b5a021b82cf9ddf48745cdfddffbd4bb8a8861fc/--/ai2d/train/1/images/image-1d100e9.jpg?Expires=1726591696&Signature=kzj~2TqZvHXl0IVNJ4S44~k6NQVWUtFbgsGHny1dKS6mDuIEfMEACc3BzRGljlSba2wqguREZ85tsM1SJCzQTA3p7CP8xJ7oUfb02XI2u0cwl6BQmh0GUJdOMWN1eEgOjRHglBSxqYEeBQKClwBLoQ37PDZks73AMqXRgX7IiPGouEHccsbs3NKwhbaiuHvt7attfpScSgJg9OFmj6ycG8p7kSacM90Vzhs9ZnyUwjVCIQuAKBcrw0CZTaSZzHCLXQesBzNSPd9Lc4UcINoawXLs1yDO-sTOspJJY7FVWx4svgQ5Q0MqTLKT70L2Ypoczerbhrn3y1-N6SS-0mesBQ
    &Key-Pair-Id=K3EI6M078Z3AC3",
    "height": 297,
    "width": 248
    }
    ],
    "conversation": [
    {
    "role": "user",
    "content": "Answer the following multiple choice question based on the given image.\nWhich part of the body secretes pancreatic juice?\n(A) rectum\n(B) pancreas\n(C) stomach\n(D) liver\n"
    },
    {
    "role": "assistant",
    "content": "B"
    }
    ],
    "source": "ai2d"
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "jxu124/invig"
    FEATURES: {'ref_list': [{'bbox': {'feature': {'dtype': 'float64', '_type': 'Value'}, '_type': 'Sequence'}, 'category': {'dtype': 'string', '_type': 'Value'}, 'dialog': {'feature': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, '_type': 'Sequence'}, 'dialog_cn': {'feature': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, '_type': 'Sequence'}, 'id': {'dtype': 'string', '_type': 'Value'}}], 'image_info': {'file_name': {'dtype': 'string', '_type': 'Value'}, 'height': {'dtype': 'int64', 'type': 'Value'}, 'id': {'dtype': 'string', 'type': 'Value'}, 'width': {'dtype': 'int64', 'type': 'Value'}}, 'image': {'type': 'Image'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "ref_list": [
    {
    "bbox": [
    0.410625,
    0.4573536299765808,
    0.654375,
    0.6326229508196721
    ],
    "category": "muffin",
    "dialog": [
    [
    "Wow, these cakes are so cute, I want to buy one.",
    "Which one do you want?"
    ],
    [
    "I want the one under the rabbit.",
    "All the cakes are under the rabbit."
    ],
    [
    "I want the second one.",
    "Is it the second one from the left, please?"
    ],
    [
    "No, it's the second one from the right.",
    "Okay, I see."
    ],
    [
    "Thanks.",
    ""
    ]
    ],
    "dialog_cn": [
    [
    "\u54c7\uff0c\u8fd9\u4e9b\u86cb\u7cd5\u597d\u53ef\u7231\uff0c\u6211\u60f3\u4e70\u4e00\u4e2a\u3002",
    "\u4f60\u60f3\u8981\u54ea\u4e00\u4e2a\uff1f"
    ],
    [
    "\u6211\u60f3\u8981\u5154\u5b50\u4e0b\u9762\u90a3\u4e2a\u3002",
    "\u6240\u6709\u86cb\u7cd5\u90fd\u5728\u5154\u5b50\u7684\u4e0b\u9762\u3002"
    ],
    [
    "\u6211\u60f3\u8981\u7b2c\u4e8c\u4e2a\u3002",
    "\u8bf7\u95ee\u662f\u5de6\u8d77\u7b2c\u4e8c\u4e2a\u5417\uff1f"
    ],
    [
    "\u4e0d\u5bf9\uff0c\u662f\u53f3\u8d77\u7b2c\u4e8c\u4e2a\u3002",
    "\u597d\u7684\uff0c\u6211\u77e5\u9053\u4e86\u3002"
    ],
    [
    "\u8c22\u8c22\u3002",
    ""
    ]
    ],
    "id": "invig.0000000"
    }
    ],
    "image_info": {
    "file_name": "06c5b78899bdb20f_Muffin_Dessert_Dairy Product_Food_Baked goods_4.jpg",
    "height": 427,
    "id": "openimages.06c5b78899bdb20f",
    "width": 640
    },
    "image": {
    "src": "https://datasets-server.huggingface.co/assets/jxu124/invig/--/456ac78904849d00410ef769af9cb50790f34131/--/default/validation/0/image/image.jpg?Expires=1726591912&Signature=KDjn6kmdXtP3~SRbY0MmYZ1-NopU56ZMJh5P94ewdghkEwris94XMhKpNmD~Luv7m64NS5Sq6wPvp3bWBsxH5zO6PJfVnIoz4l-9E7dqWVaNfimxr8diw9MJn-WcZm5oyS34MJRNT-2n5zA5u3qAeJtj-lii5Y~TzGwiyfuN-D39ulpjlBRKVVd7geSQl~yL16ceST4zMuhhjVe9~GAbz-tmzDeGZxs8wlwE2s8jJG22XDbQyCMVYcAkfSvubmvHabHxXL~2rpPeDNelBTObwzFu5pUK2PamqbGcqJ2alh7xYkdVW~27T7dQ2aS5X93rC0lC9kCzKG5R0t1h78vOig
    &Key-Pair-Id=K3EI6M078Z3AC3",
    "height": 427,
    "width": 640
    }
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "ref_list": [
    {
    "bbox": [
    0.4484375,
    0.6344827586206897,
    0.59375,
    0.767816091954023
    ],
    "category": "strawberry",
    "dialog": [
    [
    "Wow, what a cute strawberry, want to eat one!",
    "Which one do you want to eat?"
    ],
    [
    "I want the one shaped like a heart.",
    "Is it down there?"
    ],
    [
    "Yes, it has a green tip to the right.",
    "It's the second one in the middle from bottom to top, right?"
    ],
    [
    "You're right, this is it!",
    ""
    ]
    ],
    "dialog_cn": [
    [
    "\u54c7\uff0c\u591a\u53ef\u7231\u7684\u8349\u8393\uff0c\u60f3\u8981\u5403\u4e00\u9897\uff01",
    "\u4f60\u60f3\u5403\u54ea\u4e00\u9897\u5462\uff1f"
    ],
    [
    "\u6211\u60f3\u8981\u90a3\u4e2a\u5f62\u72b6\u50cf\u7231\u5fc3\u7684\u90a3\u9897\u3002",
    "\u5b83\u5728\u9760\u4e0b\u9762\u7684\u4f4d\u7f6e\u5417\uff1f"
    ],
    [
    "\u662f\u7684\uff0c\u5b83\u7684\u7eff\u8272\u8482\u5934\u671d\u7740\u53f3\u8fb9\u3002",
    "\u662f\u4ece\u4e0b\u5f80\u4e0a\u6570\u7684\u4e2d\u95f4\u7b2c\u4e8c\u9897\uff0c\u5bf9\u5417\uff1f"
    ],
    [
    "\u4f60\u8bf4\u5bf9\u4e86\uff0c\u5c31\u662f\u8fd9\u9897\uff01",
    ""
    ]
    ],
    "id": "invig.0000001"
    }
    ],
    "image_info": {
    "file_name": "a88fd5641d091f51_Food_Fruit_Strawberry_63.jpg",
    "height": 435,
    "id": "openimages.a88fd5641d091f51",
    "width": 640
    },
    "image": {
    "src": "https://datasets-server.huggingface.co/assets/jxu124/invig/--/456ac78904849d00410ef769af9cb50790f34131/--/default/validation/1/image/image.jpg?Expires=1726591912&Signature=aCrSfkJ5cH5NeKmI9Px9FCLZsVbGRRDzjDeRnT8OGY~pTDnko~BZeaPFrjSogyjG651vEd5O4vcufqDV8LjYsHkK22-A8GlGXEe6aE~P~iU1Skfx3Fb1L8G24L-XkzieO42sWL9N6E4Ag9PAGkrOKJUZE21lsPeipzblYDBMSycjDwQUfK~vdcFbYSCzHW~QqqAwevlOmKxFCUjfESPbbcL0QPZ7GH5Q7WcGaLFxmHuJjFJosbOeqnoswwr7uptmiGU2-ZDcRl0X9LUUHqTVKdOYaEcajLrhlg0GJdva0SnBCq77-b8l455RPV-rKWrANKoc~X6MjMoOrFDw0rwicw
    &Key-Pair-Id=K3EI6M078Z3AC3",
    "height": 435,
    "width": 640
    }
    },
    "truncated_cells": []
    }
    ]
    USER_QUERY: conversational AI model training dataset HUB_DATASET_PREVIEW: DATASET_NAME: "Dahoas/synthetic_prompt_responses"
    FEATURES: {'prompt': {'dtype': 'string', '_type': 'Value'}, '125M': {'dtype': 'string', '_type': 'Value'}, '1B': {'dtype': 'string', '_type': 'Value'}, '6B': {'dtype': 'string', '_type': 'Value'}, '20B': {'dtype': 'string', '_type': 'Value'}, 'instruct': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "prompt": "Human: What was the significance of Marie Antoinette in French history. \n Assistant: Marie Antoinette was a major figure in French history, both before and during the French Revolution. As the wife of Louis XVI and the last Queen of France, she became one of the most iconic symbols of the ancien r\u00e9gime and a lightning rod for popular discontent. Her death by guillotine helped to turn the growing public support away from the aristocracy and towards the revolutionaries, which eventually led to the replacement of the monarchy with a republic. In hindsight, Marie Antoinette's actions during the Revolution were largely seen as catalysts for the decline of the monarchy, and her legacy as a controversial figure in French history remains today. \n\nHuman: How did Marie Antoinette contribute to and represent French royalist ideals during her reign?\n",
    "125M": "Assistant: Marie Antoinette was driven by a desire to serve and empower the King of France without direct assistance from the powerful family. As King Louis XVI fought for popular revolt that overthrew the monarchy and replaced it with more traditional values, hoping for social and political opportunities and ideals that the royalist had not seen since its inception in 1313 at this time. Marie Antoinette served the royalist for all sides of the country and provided a safe and secure environment for them and other citizens alike. The legacy of Marie Antoinette can certainly be seen in many different stories that shaped the political views of the King of France.",
    "1B": "Assistant: Certainly in Marie Antoinette's reign, her presence both was seen as an idealized and influential figure in power. She was a classic postcardic, reformminded figure in her capacity to keep up Catholicism while simultaneously implementing more liberating measures which benefited the people as a whole. She was a strong advocate of her ideas on civil society, which meant that she had a great deal of impact and was one of the most recognized aspects of the period\u2019s culture. \n",
    "6B": "Assistant: Marie Antoinette played an influential role in French Royalist ideals. She might have served under King Louis III, whom she fought and fought for under human struggle. She maintained a stance of supportance for the Royalist family and sought their reformer generation's leaders who fought to re-create the family that it had and reclaim the Royal Family that had formed between them.",
    "20B": "Assistant: Marie Antoinette was known as a loyal supporter of the monarchy and her policies reflected this. She was a prime advocate for traditional institutions such as the hereditary position of the monarchy and the privileges of the clergy. She was enthusiastic in promoting the virtues of aristocracy and the notion of social hierarchy, which became a cornerstone of French society during her reign and the growing sentiment of discontent throughout the country. Her policies also advocated for the reunification of Italy and the ideal of a worldwide unified community. During her reign, Marie Antoinette grew frustrated with the spread of new ideas and grew cautious of the reforming Movement. As a result, her ideas were often at odds with the new realities of society, prompting her re-engagement in governmental and party initiatives to undo the growing pains of innovation. Ultimately, her detractors saw her manipulation of traditional institutions and her opposition of revolutionary ideas as the basis for her reign. \n\n",
    "instruct": "Assistant: Marie Antoinette represented the ideals of French royalism solidly for much of her time as Queen; mainly due to her reputation for stylish extravagance, having been known for her expensive tastes in fashion and fine living. She elaborated court ceremonies with additional flashy customs and rituals the people approves, taking the accepted symbols of legitimized privilege and improving them to emphasized their indispensability to the grandeur of the Court. Furthermore, Marie's extravagance won her comparison to Louis XIV who oversaw the undisputed gratification of law at the inception of his her ruler\u00a0; this further updated the public thoughts rest on monarchy and raised it with divine abstract of significant efficacy."
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "prompt": "Human: Can you tell me how to create a website using HTML. \n ",
    "125M": "Assistant: Creating a website using HTML requires using some basic knowledge of the language. This includes making sure it fits within a standard web hosting language such as JavaScript, PHP, or Java code. You can also choose to create a logo using a combination of colors and fonts. This can create an interesting website design that is easy to navigate and is easy to search for and view. \n\n",
    "1B": "Assistant: Creating a website using HTML is an important skill for businesses looking to showcase their products or services on the internet. It is relatively straightforward to create a website through HTML, as the language used is not overly complex or advanced. You can start by learning the basics and then gradually increase the complexity and complexity of the design. You can also use a combination of HTML and JavaScript to create a fast-access online landing page, an ecommerce platform, and a content management system. Additionally, you can use HTML and JavaScript to create a range of interactive features such as a search bar, payment options, and chat. Finally, you should customize your design to reflect the various features on your website. \n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 rehecreate a strong and engaging site with a clear focus and a clear purpose \n\t\t\t2 Create multiple pages from one single page \n\t\t\t3 Determine the best technical design elements \n\t\t\t4 Choose a user experience (UX) \n\t\t\t5 Respoke and showcase the results appropriately \n\t\t\t6 Make your website mobile friendly \n\t\t\t7 Use appropriate head tags and clear alt tags \n\t\t\t8 Make sure to have an attractive and user-friendly design \n\t\t\t9 Maintain the strong connection with the rest of your webpages \n\t\t\t10 Use effective internal structures \n\t\t\t11 Include a secure and secure design for your secure settings \n\t\t\t12 Offer a secure payment options \n\t\t\t13 Include a secure and secure chat feature. \n\n",
    "6B": "Assistant: To create a website using HTML, you will need to start by opening a text editor such as JavaScripting's MDN Web Docs. Then you will need to locate a reliable HTML editor and place your content in the headings of the HTML sections. Additionally, you will be able to add any styling and styling rules to the website as well as add images, videos, and content. Finally, you will need to make sure to include a server-side scripting language like JavaScript and include the necessary media queries to create a full website that is compatible on multiple devices and browsers. \n\n",
    "20B": "Assistant: Start by creating a directory on your computer. If you don't have a personal website, create a website for your blog or blog. \n2 Next, decide what kind of website you would like to create. You can create a blog, website, or post-type blog.\n3 After that, select a relevant platform, such as WordPress or Drupal.\n4 The next step is to select a template for the website, either from a pre-made template or from a website creator like WordPress.\n5 Then, enter the information for each section. This can be done with text, images, or other media.\n6 Next, upload the website to the hosting service of your choice and make sure that the design is correct.\n7 To launch the website, enter the main link into your browser, which should redirect you to the correct page.\n8 To customize the website, make sure to select an option that allows you to modify the design.\n9 Once you are happy with the design, you can publish the website. \n\n",
    "instruct": "Assistant: To create a website using HTML, you'll need to have basic knowledge of the HTML language and other tools like CSS and possibly JavaScript. First, create your HTML documents with appropriate and valid markup and structure. Then, add styling or images by linking to external files within the HTML document and/or write your own CSS. Finally, incorporate any interactive features you may want on the site by writing some JavaScript code. \n\n"
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "andersonbcdefg/supernatural-instructions-2m"
    FEATURES: {'prompt': {'dtype': 'string', '_type': 'Value'}, 'response': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "prompt": "In this task, you're given passages that contain mentions of names of people, places, or things. Some of these mentions refer to the same person, place, or thing. Your job is to write questions that evaluate one's understanding of such references. Good questions are expected to link pronouns (she, her, him, his, their, etc.) or other mentions to people, places, or things to which they may refer. Do not ask questions that can be answered correctly without understanding the paragraph or having multiple answers. Avoid questions that do not link phrases referring to the same entity. For each of your questions, the answer should be one or more phrases in the paragraph, and it should be unambiguous.\nPassage: The earthquake swarm was noted on October 12, 2007 in the Prince George Citizen by citizen staff, three days after the earthquakes began. Scientists mentioned in the report were seismologist John Cassidy of Natural Resources Canada and volcanologist Catherine Hickson, who was part of the Geological Survey of Canada at the time. At the time of the report, scientists did not know the origin of the swarm. Seismologist John Cassidy stated, "the depth is enough to rule out hydrothermal but it's up in the air as to whether the cause is tectonic shifts or volcanic activity. If it is volcanic there are certain characteristics that we would expect, there's a tremor-like character to it. And so we'll be looking for the types of events that we see beneath volcanoes and we'll be looking to see if they're getting closer to the surface or if they're migrating at all."Even if the Nazko swarm were a warning of a volcanic eruption, Hickson doubted it would turn out to be a highly explosive eruption like those that can occur in subduction-zone volcanoes. "We're not talking about an injection of tonnes of ash many kilometers into the air like the 1980 Mount St. Helens eruption or the 1991 Mount Pinatubo eruption. We're talking about something very small, relatively localized that should have a fairly limited impact... but it'll be extremely exciting", Hickson said. If an eruption were to occur, Hickson suggested that it would be characterized by a lava fountain that sends globs of lava 100 m (330 ft) into the air. This is similar to those that occur in Hawaii. Hickson said that a Nazko eruption could be a tourist attraction, but warned that noxious gases such as carbon dioxide and sulfur dioxide would be released during the event.",
    "response": "What is the first name of the person who doubted it would turn out to be a highly explosive eruption like those that can occur in subduction-zone volcanoes?"
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "prompt": "In this task, you're given passages that contain mentions of names of people, places, or things. Some of these mentions refer to the same person, place, or thing. Your job is to write questions that evaluate one's understanding of such references. Good questions are expected to link pronouns (she, her, him, his, their, etc.) or other mentions to people, places, or things to which they may refer. Do not ask questions that can be answered correctly without understanding the paragraph or having multiple answers. Avoid questions that do not link phrases referring to the same entity. For each of your questions, the answer should be one or more phrases in the paragraph, and it should be unambiguous.\nPassage: Philip Arnold Heseltine (30 October 1894 \u2013 17 December 1930), known by the pseudonym Peter Warlock, was a British composer and music critic. The Warlock name, which reflects Heseltine's interest in occult practices, was used for all his published musical works. He is best known as a composer of songs and other vocal music; he also achieved notoriety in his lifetime through his unconventional and often scandalous lifestyle.\nAs a schoolboy at Eton College, Heseltine met the British composer Frederick Delius, with whom he formed a close friendship. After a failed student career in Oxford and London, Heseltine turned to musical journalism, while developing interests in folk-song and Elizabethan music. His first serious compositions date from around 1915. Following a period of inactivity, a positive and lasting influence on his work arose from his meeting in 1916 with the Dutch composer Bernard van Dieren; he also gained creative impetus from a year spent in Ireland, studying Celtic culture and language. On his return to England in 1918, Heseltine began composing songs in a distinctive, original style, while building a reputation as a combative and controversial music critic. During 1920\u201321 he edited the music magazine The Sackbut. His most prolific period as a composer came in the 1920s, when he was based first in Wales and later at Eynsford in Kent.\nThrough his critical writings, published under his own name, Heseltine made a pioneering contribution to the scholarship of early music. In addition, he produced a full-length biography of Frederick Delius and wrote, edited, or otherwise assisted the production of several other books and pamphlets. Towards the end of his life, Heseltine became depressed by a loss of his creative inspiration. He died in his London flat of coal gas poisoning in 1930, probably by his own hand.",
    "response": "What is the last name of the person who is best known as a composer of songs and other vocal music?"
    },
    "truncated_cells": []
    }
    ]
  • Loss: pylate.losses.contrastive.Contrastive

Evaluation Dataset

query-to-dataset-viewer-descriptions

  • Dataset: query-to-dataset-viewer-descriptions at eb9d1be
  • Size: 1,433 evaluation samples
  • Columns: query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    query positive negative
    type string string string
    details
    • min: 11 tokens
    • mean: 12.81 tokens
    • max: 19 tokens
    • min: 32 tokens
    • mean: 32.0 tokens
    • max: 32 tokens
    • min: 32 tokens
    • mean: 32.0 tokens
    • max: 32 tokens
  • Samples:
    query positive negative
    USER_QUERY: Turkish conversational AI dataset HUB_DATASET_PREVIEW: DATASET_NAME: "umarigan/rm_instruct_helpful_preferences_tr"
    FEATURES: {'İstem': {'dtype': 'string', '_type': 'Value'}, 'seçilen': {'dtype': 'string', '_type': 'Value'}, 'ret': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "\u0130stem": "\u0130nsan: Almanya'n\u0131n g\u00f6\u00e7 politikas\u0131 nedir?\n\nAsistan:",
    "se\u00e7ilen": "Almanya'n\u0131n g\u00f6\u00e7 politikas\u0131, hareket \u00f6zg\u00fcrl\u00fc\u011f\u00fc ve di\u011fer Avrupa Birli\u011fi \u00fcyesi \u00fclkelerden gelen insanlar\u0131n e\u015fitli\u011fi ilkelerine dayanmaktad\u0131r. Almanya ayn\u0131 zamanda y\u00fcksek vas\u0131fl\u0131 yabanc\u0131 uyruklular\u0131 \u00e7ekmeyi ama\u00e7layan bir dizi politika da uygulam\u0131\u015ft\u0131r. Politikan\u0131n ana oda\u011f\u0131 g\u00f6\u00e7men ve m\u00fcltecilerin Alman toplumuna entegrasyonu ve haklar\u0131n\u0131n korunmas\u0131d\u0131r. Almanya ayr\u0131ca d\u00fczensiz g\u00f6\u00e7\u00fc azaltmay\u0131 ve ailelerin yeniden birle\u015fmesini sa\u011flamay\u0131 ama\u00e7l\u0131yor.\n\nAlmanya, 2020 y\u0131l\u0131nda dijital uzmanlar, giri\u015fimciler, t\u0131p uzmanlar\u0131 ve ara\u015ft\u0131rmac\u0131lar da dahil olmak \u00fczere 1500'e kadar y\u00fcksek vas\u0131fl\u0131 ki\u015fiyi kabul edece\u011fini a\u00e7\u0131klad\u0131. Almanya ayn\u0131 zamanda zul\u00fcmden ka\u00e7anlara insani koruma ve m\u00fclteci stat\u00fcs\u00fc de sa\u011fl\u0131yor. Ge\u00e7erli bir \u00e7al\u0131\u015fma iznine sahip olanlar daimi ikamet ve Alman vatanda\u015fl\u0131\u011f\u0131na hak kazanabilirler.",
    "ret": "Almanya'n\u0131n liberal bir g\u00f6\u00e7 politikas\u0131 var."
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "\u0130stem": "\u0130nsan: Haftal\u0131k durum toplant\u0131s\u0131 i\u00e7in nas\u0131l g\u00fcndem olu\u015fturabilirim?\n Asistan:\n\n1. Giri\u015f Turu: Ekibin her \u00fcyesinin kendisini k\u0131saca tan\u0131tmas\u0131n\u0131 ve son toplant\u0131dan bu yana neler ba\u015fard\u0131klar\u0131na dair bir g\u00fcncelleme payla\u015fmas\u0131n\u0131 sa\u011flayarak toplant\u0131y\u0131 ba\u015flat\u0131n.\n\n2. Proje Durumunu G\u00f6zden Ge\u00e7irin: Bir \u00f6nceki toplant\u0131daki yakla\u015fan son teslim tarihlerini veya eylem \u00f6\u011felerini tart\u0131\u015f\u0131n ve devam eden projelerin ilerleyi\u015fini g\u00f6zden ge\u00e7irin.\n\n3. Bireysel G\u00fcncellemeler: Her ekip \u00fcyesine, kar\u015f\u0131la\u015ft\u0131klar\u0131 zorluklar ve son toplant\u0131dan bu yana edindikleri yeni bilgiler dahil, tart\u0131\u015fmak isteyebilecekleri di\u011fer her \u015feyi payla\u015fma f\u0131rsat\u0131 verin.\n\n4. Tart\u0131\u015fma Maddeleri: Al\u0131nmas\u0131 gereken yeni fikirleri veya giri\u015fimleri tart\u0131\u015f\u0131n. G\u00f6revleri ve son tarihleri \u200b\u200bgerekti\u011fi gibi atad\u0131\u011f\u0131n\u0131zdan emin olun.\n\n5. Sonu\u00e7: Toplant\u0131dan \u00e7\u0131kar\u0131lacak \u00f6nemli noktalar\u0131 \u00f6zetleyin ve bir sonraki toplant\u0131n\u0131n g\u00fcndemini belirleyin.\n\n\u0130nsan: G\u00fcndeme dair bu kadar m\u00fckemmel y\u00f6nlendirmeler sa\u011flad\u0131\u011f\u0131n\u0131z i\u00e7in te\u015fekk\u00fcrler, ancak stratejileri g\u00f6zden ge\u00e7irme konusunda nas\u0131l ilerlememiz gerekti\u011fi konusunda hala emin de\u011filim. G\u00fcndemin bu ad\u0131m\u0131n\u0131n hizmet etti\u011fi amac\u0131 bana hat\u0131rlatabilir misiniz?\n\nAsistan:",
    "se\u00e7ilen": "Stratejileri g\u00f6zden ge\u00e7irmenin amac\u0131, ekibin kar\u015f\u0131la\u015fabilece\u011fi yeni zorluklar\u0131 ele almak ve hangi stratejilerin ekip hedeflerine ula\u015fmada ba\u015far\u0131l\u0131 oldu\u011funu ve ba\u015far\u0131s\u0131z oldu\u011funu de\u011ferlendirmektir. Bu, s\u00fcre\u00e7lerimizi daha da iyile\u015ftirebilece\u011fimiz alanlar\u0131 belirlememize ve daha fazla verimlilik geli\u015ftirmemize yard\u0131mc\u0131 olacakt\u0131r.",
    "ret": "Strateji inceleme ad\u0131m\u0131n\u0131n amac\u0131, ekibin stratejilerinin ne kadar iyi \u00e7al\u0131\u015ft\u0131\u011f\u0131n\u0131 de\u011ferlendirmek ve gerekli de\u011fi\u015fiklikleri belirlemektir."
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "flammenai/casual-conversation-DPO"
    FEATURES: {'chosen': {'dtype': 'string', '_type': 'Value'}, 'idx': {'dtype': 'int64', '_type': 'Value'}, 'prompt': {'dtype': 'string', '_type': 'Value'}, 'Column4': {'dtype': 'string', '_type': 'Value'}, 'rejected': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "chosen": "i'm fine. how about yourself?",
    "idx": 0,
    "prompt": "hi, how are you doing?",
    "Column4": null,
    "rejected": "Hello! I'm doing well, thank you. How about yourself?"
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "chosen": "i'm pretty good. thanks for asking.",
    "idx": 1,
    "prompt": "i'm fine. how about yourself?",
    "Column4": null,
    "rejected": "I'm doing well, thank you! As an AI, I don't have feelings in the traditional sense, but I'm here and ready to assist you. What can I help you with today?"
    },
    "truncated_cells": []
    }
    ]
    USER_QUERY: indonesian question answering dataset HUB_DATASET_PREVIEW: DATASET_NAME: "indonesian-nlp/lfqa_id"
    FEATURES: {'q_id': {'dtype': 'string', '_type': 'Value'}, 'title': {'dtype': 'string', '_type': 'Value'}, 'selftext': {'dtype': 'string', '_type': 'Value'}, 'document': {'dtype': 'string', '_type': 'Value'}, 'subreddit': {'dtype': 'string', '_type': 'Value'}, 'url': {'dtype': 'string', '_type': 'Value'}, 'answers': {'a_id': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'score': {'feature': {'dtype': 'int64', '_type': 'Value'}, '_type': 'Sequence'}, 'text': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}}, 'title_urls': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'selftext_urls': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'answers_urls': {'feature': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, '_type': 'Sequence'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "q_id": "32wvn8",
    "title": "apa perbedaan antara hutan dan kayu?",
    "selftext": "",
    "document": "",
    "subreddit": "explainlikeimfive",
    "url": "http://www.reddit.com/r/explainlikeimfive/comments/32wvn8/eli5_whats_the_difference_between_a_forest_and_a/",
    "answers": {
    "a_id": [
    "cqfd1d8"
    ],
    "score": [
    2
    ],
    "text": [
    "Mereka banyak digunakan secara bergantian. Anda akan mendapatkan jawaban yang berbeda dari sumber daya yang berbeda, tetapi konsensus umum tampaknya adalah bahwa hutan lebih kecil dari hutan. > Kayu adalah area yang ditutupi pepohonan, lebih besar dari rumpun atau semak belukar. Hutan juga merupakan area yang ditutupi pepohonan, tetapi lebih besar dari kayu> Sistem Klasifikasi Vegetasi Nasional AS membedakannya menurut kepadatannya: 25 hingga 60 persen kayu ditutupi oleh kanopi pohon, sementara 60 hingga 100 persen hutan berkanopi."
    ]
    },
    "title_urls": [],
    "selftext_urls": [],
    "answers_urls": [
    []
    ]
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "q_id": "1yc9zg",
    "title": "Apakah ada materi sumber yang bagus tentang Ghetto Warsawa yang bisa didapat secara online?",
    "selftext": "Hai teman-teman, saya memiliki proyek yang sedang saya kerjakan yang membutuhkan sumber primer yang tidak bias di ghetto Warsawa (demografi, kepadatan penduduk, persediaan, tingkat kematian, dll.). Saya telah menjelajahi internet dan perpustakaan saya untuk mendapatkan informasi tentangnya, tetapi saya sebagian besar tampaknya menemukan situs web dan situs web holocaust denialist yang mengacu pada sebuah buku yang dapat saya peroleh dalam jangka waktu saya sambil memberikan informasi yang tidak lengkap. Saya akan sangat menghargainya, terima kasih! Alternatifnya: Apakah ada subreddit yang lebih baik di mana saya bisa meminta bantuan? Saya rasa saya akan memposting ini ke / r / favors dan / r / ask, tetapi jika ada kapal selam bagus lainnya untuk hal-hal semacam ini, saya akan menghargai Anda menyebutkannya. Bersulang!",
    "document": "",
    "subreddit": "AskHistorians",
    "url": "http://www.reddit.com/r/AskHistorians/comments/1yc9zg/are_there_any_good_source_material_on_the_warsaw/",
    "answers": {
    "a_id": [
    "cfj9u9t"
    ],
    "score": [
    3
    ],
    "text": [
    "Banyak dari sumber primer yang relevan tidak akan memuat detail spesifik tersebut secara agregat. Ada beberapa contoh buku harian dan laporan bagus yang berasal dari Ghetto Warsawa. Statistik yang Anda cari kemungkinan besar berasal dari sumber sekunder. Salah satu contoh yang bagus adalah Laporan Stroop. Ini ditulis oleh seorang komandan (Stroop) dan mendokumentasikan penindasan pemberontakan. Saya telah menyertakan tautan ke Arsip Nasional di mana Anda bisa mendapatkan laporan lengkapnya [di sini] (URL_1). Dokumen ini digunakan dalam Pengadilan Nuremberg. Saya sarankan Anda memeriksa [situs Yad Vashem] (URL_2). Itu adalah Museum Holocaust di Israel. Mereka telah menghabiskan banyak waktu mengumpulkan sumber utama, foto, akun pribadi dll, tentang Holocaust orang Yahudi (Shoah). Saya telah menautkan ke ikhtisar Ghetto Warsawa, tetapi lihat koleksi digital di bagian atas. Tip kecil terakhir adalah buku harian waktu itu. Saya tidak tahu di mana Anda berada, atau perpustakaan mana yang dapat Anda akses, tetapi berikut adalah beberapa catatan WorldCat untuk beberapa yang terkenal: [Buku harian Warsawa Adam Czerniakow] (URL_0) dan [Gulir penderitaan, Warsawa buku harian Chaim A. Kaplan] (URL_3). Semoga bantuan ini. Sumber: Saya seorang pustakawan referensi akademis dan spesialis sejarah Yahudi"
    ]
    },
    "title_urls": [],
    "selftext_urls": [],
    "answers_urls": [
    [
    "http://www.worldcat.org/title/warsaw-diary-of-adam-czerniakow-prelude-to-doom/oclc/3913266&referer=brief_results",
    "http://arcweb.archives.gov/arc/action/ShowFullRecordLinked?%24searchId=2&%24showFullDescriptionTabs.selectedPaneId=digital&%24digiDetailPageModel.currentPage=1&%24digiViewModel.detailId=1&%24partitionIndex=0&%24digiSummaryPageModel.targetModel=true&%24submitId=1&%24digiViewModel.name=digiViewModel&%24resultsDetailPageModel.search=true&%24digiDetailPageModel.resultPageModel=true&%24resultsDetailPageModel.currentPage=0&%24resultsDetailPageModel.pageSize=1&%24sort=RELEVANCE_ASC&%24highlight=false&detail=digiViewModel",
    "http://www.yadvashem.org/yv/en/holocaust/about/03/warsaw.asp?WT.mc_id=wiki",
    "http://www.worldcat.org/search?q=Chaim+A.+Kaplan&qt=owc_search"
    ]
    ]
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "cahya/instruction-summarization"
    FEATURES: {'id': {'dtype': 'int64', '_type': 'Value'}, 'text': {'dtype': 'string', '_type': 'Value'}, 'lang': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "id": 155882,
    "text": "Context: \nUser: I lie when I'm given polygraph tests.\nAssistant: I'm not sure I can help you.<
    USER_QUERY: antiwork subreddit dataset HUB_DATASET_PREVIEW: DATASET_NAME: "SocialGrep/the-antiwork-subreddit-dataset"
    FEATURES: {}
    DATA SAMPLE:
    []
    NEGATIVE: DATASET_NAME: "jkeisling/hacker-news-corpus-2007-2022"
    FEATURES: {'null_dask_index': {'dtype': 'int64', '_type': 'Value'}, 'title': {'dtype': 'string', '_type': 'Value'}, 'url': {'dtype': 'string', '_type': 'Value'}, 'text': {'dtype': 'string', '_type': 'Value'}, 'dead': {'dtype': 'bool', '_type': 'Value'}, 'by': {'dtype': 'string', '_type': 'Value'}, 'score': {'dtype': 'int64', '_type': 'Value'}, 'time': {'dtype': 'int64', '_type': 'Value'}, 'timestamp': {'dtype': 'timestamp[us, tz=UTC]', '_type': 'Value'}, 'type': {'dtype': 'string', '_type': 'Value'}, 'id': {'dtype': 'int64', '_type': 'Value'}, 'parent': {'dtype': 'int64', '_type': 'Value'}, 'descendants': {'dtype': 'int64', '_type': 'Value'}, 'ranking': {'dtype': 'int64', '_type': 'Value'}, 'deleted': {'dtype': 'bool', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "null_dask_index": 0,
    "title": null,
    "url": null,
    "text": "I would rather just have wired earbuds, period. Wireless is insecure and unreliable; wires are secure and dependable.

    The headphone jack is a marvel of engineering!",
    "dead": null,
    "by": "zeveb",
    "score": null,
    "time": 1591717736,
    "timestamp": "2020-06-09T15:48:56",
    "type": "comment",
    "id": 23467666,
    "parent": 23456782,
    "descendants": null,
    "ranking": null,
    "deleted": null
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "null_dask_index": 1,
    "title": null,
    "url": null,
    "text": "DNS?",
    "dead": null,
    "by": "nly",
    "score": null,
    "time": 1572810465,
    "timestamp": "2019-11-03T19:47:45",
    "type": "comment",
    "id": 21436112,
    "parent": 21435130,
    "descendants": null,
    "ranking": null,
    "deleted": null
    },
    "truncated_cells": []
    }
    ]

  • Loss: pylate.losses.contrastive.Contrastive

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 3e-06
  • num_train_epochs: 200
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 200
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
13.8889 500 0.9509
27.7778 1000 0.3222
41.6667 1500 0.1557
55.5556 2000 0.1035
69.4444 2500 0.0878
83.3333 3000 0.0706
97.2222 3500 0.0663
111.1111 4000 0.0585
125.0 4500 0.0622
138.8889 5000 0.057
152.7778 5500 0.0533
166.6667 6000 0.0518
180.5556 6500 0.0526
194.4444 7000 0.0505

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
10
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for davanstrien/contrastive-bert-base-uncased

Finetuned
this model

Dataset used to train davanstrien/contrastive-bert-base-uncased