trbeers's picture
Add new SentenceTransformer model.
a840b09 verified
|
raw
history blame
20.3 kB
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:3000
  - loss:MultipleNegativesRankingLoss
base_model: distilbert/distilroberta-base
datasets:
  - sentence-transformers/all-nli
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: >-
      An Indian woman is washing and cleaning dirty laundry at a lake and in the
      background is a kid who appears to have jumped into the lake.
    sentences:
      - An Indian woman is doing her laundry in a lake.
      - An Indian woman is putting her laundry into the machine.
      - A girl is playing with a Slinky.
  - source_sentence: Nine women in white robes with hoods walk on plush, green grass.
    sentences:
      - The women each have one head.
      - Two friends sitting on step at their job.
      - The woman is alone and asleep in her bedroom.
  - source_sentence: >-
      Under a blue sky with white clouds, a child reaches up to touch the
      propeller of a plane standing parked on a field of grass.
    sentences:
      - A child is reaching to touch the propeller of a plane.
      - The boy is sitting
      - A child is playing with a ball.
  - source_sentence: A man and a woman are talking in a park
    sentences:
      - A man is heading to his house of worship.
      - A pair of people are talking outdoors.
      - A man and woman are talking in the aquarium.
  - source_sentence: A man running a marathon talks to his friend.
    sentences:
      - People watching hot air balloons inflating.
      - There is a man running.
      - There are people canoeing down a river.
pipeline_tag: sentence-similarity
model-index:
  - name: SentenceTransformer based on distilbert/distilroberta-base
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.7444932434233196
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7769282355085634
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7502489213535852
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7574428535049513
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.752089041601621
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7583983155030144
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.49365896310259416
            name: Pearson Dot
          - type: spearman_dot
            value: 0.49513705166832495
            name: Spearman Dot
          - type: pearson_max
            value: 0.752089041601621
            name: Pearson Max
          - type: spearman_max
            value: 0.7769282355085634
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.7101248020205797
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7072744861979087
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7133109440593921
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.6966728374126535
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7142547715068376
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.6959833440145297
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.4503698330540162
            name: Pearson Dot
          - type: spearman_dot
            value: 0.43425556993054526
            name: Spearman Dot
          - type: pearson_max
            value: 0.7142547715068376
            name: Pearson Max
          - type: spearman_max
            value: 0.7072744861979087
            name: Spearman Max

SentenceTransformer based on distilbert/distilroberta-base

This is a sentence-transformers model finetuned from distilbert/distilroberta-base on the sentence-transformers/all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("trbeers/distilroberta-base-nli-v2")
# Run inference
sentences = [
    'A man running a marathon talks to his friend.',
    'There is a man running.',
    'There are people canoeing down a river.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7445
spearman_cosine 0.7769
pearson_manhattan 0.7502
spearman_manhattan 0.7574
pearson_euclidean 0.7521
spearman_euclidean 0.7584
pearson_dot 0.4937
spearman_dot 0.4951
pearson_max 0.7521
spearman_max 0.7769

Semantic Similarity

Metric Value
pearson_cosine 0.7101
spearman_cosine 0.7073
pearson_manhattan 0.7133
spearman_manhattan 0.6967
pearson_euclidean 0.7143
spearman_euclidean 0.696
pearson_dot 0.4504
spearman_dot 0.4343
pearson_max 0.7143
spearman_max 0.7073

Training Details

Training Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at d482672
  • Size: 3,000 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 10.38 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 12.8 tokens
    • max: 39 tokens
    • min: 6 tokens
    • mean: 13.4 tokens
    • max: 50 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at d482672
  • Size: 300 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 18.06 tokens
    • max: 53 tokens
    • min: 5 tokens
    • mean: 9.8 tokens
    • max: 28 tokens
    • min: 5 tokens
    • mean: 10.44 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step loss sts-dev_spearman_cosine sts-test_spearman_cosine
0 0 - 0.6375 -
0.4167 10 2.2687 0.7713 -
0.8333 20 1.8101 0.7769 -
1.0 24 - - 0.7073

Framework Versions

  • Python: 3.10.11
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.1
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}