Romaniox's picture
Add new SentenceTransformer model.
32df00b verified
metadata
base_model: sentence-transformers/all-mpnet-base-v2
datasets:
  - sentence-transformers/stsb
language:
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:5749
  - loss:CosineSimilarityLoss
widget:
  - source_sentence: The man talked to a girl over the internet camera.
    sentences:
      - A group of elderly people pose around a dining table.
      - A teenager talks to a girl over a webcam.
      - There is no 'still' that is not relative to some other object.
  - source_sentence: A woman is writing something.
    sentences:
      - Two eagles are perched on a branch.
      - >-
        It refers to the maximum f-stop (which is defined as the ratio of focal
        length to effective aperture diameter).
      - A woman is chopping green onions.
  - source_sentence: The player shoots the winning points.
    sentences:
      - Minimum wage laws hurt the least skilled, least productive the most.
      - The basketball player is about to score points for his team.
      - Sheep are grazing in the field in front of a line of trees.
  - source_sentence: >-
      Stars form in star-formation regions, which itself develop from molecular
      clouds.
    sentences:
      - >-
        Although I believe Searle is mistaken, I don't think you have found the
        problem.
      - >-
        It may be possible for a solar system like ours to exist outside of a
        galaxy.
      - >-
        A blond-haired child performing on the trumpet in front of a house while
        his younger brother watches.
  - source_sentence: >-
      While Queen may refer to both Queen regent (sovereign) or Queen consort,
      the King has always been the sovereign.
    sentences:
      - At first, I thought this is a bit of a tricky question.
      - A man sitting on the floor in a room is strumming a guitar.
      - >-
        There is a very good reason not to refer to the Queen's spouse as "King"
        - because they aren't the King.
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.911749655195313
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.9110202401316706
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.901061533618126
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.9103533589206381
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.9015395669225589
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.9110202401316706
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.9117496546702127
            name: Pearson Dot
          - type: spearman_dot
            value: 0.9110202401316706
            name: Spearman Dot
          - type: pearson_max
            value: 0.911749655195313
            name: Pearson Max
          - type: spearman_max
            value: 0.9110202401316706
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8782978405660395
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8761303083125416
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8698483409314474
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8757041012701375
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8701603623246028
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8761303083125416
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8782978433249164
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8761303083125416
            name: Spearman Dot
          - type: pearson_max
            value: 0.8782978433249164
            name: Pearson Max
          - type: spearman_max
            value: 0.8761303083125416
            name: Spearman Max

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the sentence-transformers/stsb dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Romaniox/all-mpnet-base-v2-sts")
# Run inference
sentences = [
    'While Queen may refer to both Queen regent (sovereign) or Queen consort, the King has always been the sovereign.',
    'There is a very good reason not to refer to the Queen\'s spouse as "King" - because they aren\'t the King.',
    'A man sitting on the floor in a room is strumming a guitar.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9117
spearman_cosine 0.911
pearson_manhattan 0.9011
spearman_manhattan 0.9104
pearson_euclidean 0.9015
spearman_euclidean 0.911
pearson_dot 0.9117
spearman_dot 0.911
pearson_max 0.9117
spearman_max 0.911

Semantic Similarity

Metric Value
pearson_cosine 0.8783
spearman_cosine 0.8761
pearson_manhattan 0.8698
spearman_manhattan 0.8757
pearson_euclidean 0.8702
spearman_euclidean 0.8761
pearson_dot 0.8783
spearman_dot 0.8761
pearson_max 0.8783
spearman_max 0.8761

Training Details

Training Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.0 tokens
    • max: 28 tokens
    • min: 5 tokens
    • mean: 9.95 tokens
    • max: 25 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 15.1 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 15.11 tokens
    • max: 53 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine sts-test_spearman_cosine
0.2778 100 0.0218 0.0208 0.8965 -
0.5556 200 0.0205 0.0198 0.8978 -
0.8333 300 0.0193 0.0185 0.9002 -
1.1111 400 0.0153 0.0191 0.9026 -
1.3889 500 0.0091 0.0192 0.9041 -
1.6667 600 0.0089 0.0178 0.9054 -
1.9444 700 0.0093 0.0178 0.9088 -
2.2222 800 0.0059 0.0175 0.9102 -
2.5 900 0.0047 0.0176 0.9103 -
2.7778 1000 0.0047 0.0175 0.9098 -
3.0556 1100 0.0043 0.0176 0.9121 -
3.3333 1200 0.003 0.0174 0.9113 -
3.6111 1300 0.0031 0.0175 0.9109 -
3.8889 1400 0.003 0.0174 0.9110 -
4.0 1440 - - - 0.8761

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0.dev0
  • Transformers: 4.42.2
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}