philipp-zettl's picture
Add new SentenceTransformer model.
594a251 verified
|
raw
history blame
No virus
19 kB
metadata
language: []
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:665
  - loss:CoSENTLoss
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
datasets: []
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: Is there a free return policy?
    sentences:
      - general query
      - faq query
      - product query
  - source_sentence: Quiero reservar un vuelo a Madrid
    sentences:
      - faq query
      - general query
      - product query
  - source_sentence: Bestell mir einen Bluetooth-Lautsprecher
    sentences:
      - faq query
      - general query
      - general query
  - source_sentence: Kann ich meinen Account auf Premium upgraden?
    sentences:
      - faq query
      - product query
      - faq query
  - source_sentence: Was kostet der Versand nach Kanada?
    sentences:
      - product query
      - faq query
      - faq query
pipeline_tag: sentence-similarity
model-index:
  - name: >-
      SentenceTransformer based on
      sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: MiniLM dev
          type: MiniLM-dev
        metrics:
          - type: pearson_cosine
            value: 0.7060858093148971
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7122657953703283
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.5850353380261794
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.6010204119883696
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.5997691394008732
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.6079117189235353
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7251159526734934
            name: Pearson Dot
          - type: spearman_dot
            value: 0.732939716175825
            name: Spearman Dot
          - type: pearson_max
            value: 0.7251159526734934
            name: Pearson Max
          - type: spearman_max
            value: 0.732939716175825
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: MiniLM test
          type: MiniLM-test
        metrics:
          - type: pearson_cosine
            value: 0.8232712880664017
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.822196399839697
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7831863345453927
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8000293400400974
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.792921493930252
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.80506730817637
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8011854727667188
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8151432444489153
            name: Spearman Dot
          - type: pearson_max
            value: 0.8232712880664017
            name: Pearson Max
          - type: spearman_max
            value: 0.822196399839697
            name: Spearman Max

SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("philipp-zettl/MiniLM-similarity-small")
# Run inference
sentences = [
    'Was kostet der Versand nach Kanada?',
    'faq query',
    'product query',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7061
spearman_cosine 0.7123
pearson_manhattan 0.585
spearman_manhattan 0.601
pearson_euclidean 0.5998
spearman_euclidean 0.6079
pearson_dot 0.7251
spearman_dot 0.7329
pearson_max 0.7251
spearman_max 0.7329

Semantic Similarity

Metric Value
pearson_cosine 0.8233
spearman_cosine 0.8222
pearson_manhattan 0.7832
spearman_manhattan 0.8
pearson_euclidean 0.7929
spearman_euclidean 0.8051
pearson_dot 0.8012
spearman_dot 0.8151
pearson_max 0.8233
spearman_max 0.8222

Training Details

Training Dataset

Unnamed Dataset

  • Size: 665 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 7 tokens
    • mean: 11.29 tokens
    • max: 19 tokens
    • min: 5 tokens
    • mean: 5.31 tokens
    • max: 6 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Send me deals on gaming accessories product query 1.0
    Aidez-moi à synchroniser mes contacts sur mon téléphone faq query 0.0
    Какие у вас есть предложения по ноутбукам? faq query 0.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 84 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 7 tokens
    • mean: 11.32 tokens
    • max: 17 tokens
    • min: 5 tokens
    • mean: 5.42 tokens
    • max: 6 tokens
    • min: 0.0
    • mean: 0.46
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    كيف يمكنني تتبع شحنتي؟ support query 0.0
    Aidez-moi à configurer une nouvelle adresse e-mail support query 1.0
    Envoyez-moi les dernières promotions sur les montres connectées product query 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 8
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 8
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss MiniLM-dev_spearman_cosine MiniLM-test_spearman_cosine
0.4762 10 1.3639 0.8946 0.0665 -
0.9524 20 0.8488 0.7608 0.2318 -
1.4286 30 0.6629 1.0463 0.3736 -
1.9048 40 1.1413 1.1547 0.4159 -
2.3810 50 1.8156 1.2059 0.4760 -
2.8571 60 2.0179 0.8129 0.5794 -
3.3333 70 0.3202 0.6236 0.6217 -
3.8095 80 0.1437 0.6061 0.6404 -
4.2857 90 1.1623 0.7312 0.6424 -
4.7619 100 0.4376 0.5987 0.6621 -
5.2381 110 0.5832 0.4848 0.6837 -
5.7143 120 0.1749 0.3367 0.6896 -
6.1905 130 0.0192 0.2607 0.6936 -
6.6667 140 0.2047 0.2564 0.6995 -
7.1429 150 0.404 0.2747 0.7103 -
7.6190 160 0.0008 0.2764 0.7123 -
8.0 168 - - - 0.8222

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}