w601sxs's picture
Add new SentenceTransformer model.
d581bb0 verified
|
raw
history blame
20.1 kB
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dataset_size:100K<n<1M
  - loss:MSELoss
base_model: w601sxs/b1ade-embed
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - negative_mse
widget:
  - source_sentence: A man is jumping.
    sentences:
      - The man is jumping off something.
      - Two people are posing for a photograph.
      - two women sing opera
  - source_sentence: The wave is huge.
    sentences:
      - A person is surfing on a large wave.
      - People are competing in figure skating.
      - Cats are sleeping inside the room.
  - source_sentence: The man is short.
    sentences:
      - There is a man vaucuming
      - The man did a self portrait of himself.
      - The boys are asleep in their beds.
  - source_sentence: A boy is bowling.
    sentences:
      - A boy is rolling a ball in a hotel hallway.
      - PHS enrolls approximately 750 students.
      - The older men are talking about their wives.
  - source_sentence: A man is walking
    sentences:
      - The man is going for a walk.
      - The station opened on 1 December 1896.
      - The woman is alone and asleep in the car on the moon.
pipeline_tag: sentence-similarity
model-index:
  - name: SentenceTransformer based on w601sxs/b1ade-embed
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.6737565660591995
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7346594963661589
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.700631080294873
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7089388326911368
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7016605503100202
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7101559719602629
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7336031520397918
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7509506568007358
            name: Spearman Dot
          - type: pearson_max
            value: 0.7336031520397918
            name: Pearson Max
          - type: spearman_max
            value: 0.7509506568007358
            name: Spearman Max
      - task:
          type: knowledge-distillation
          name: Knowledge Distillation
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: negative_mse
            value: -21.545076370239258
            name: Negative Mse
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.677225151823628
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7310810412009605
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7076654744568199
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7120808159972457
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7070890827522099
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7115055158750536
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7026111016442886
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6949199269988278
            name: Spearman Dot
          - type: pearson_max
            value: 0.7076654744568199
            name: Pearson Max
          - type: spearman_max
            value: 0.7310810412009605
            name: Spearman Max

SentenceTransformer based on w601sxs/b1ade-embed

This is a sentence-transformers model finetuned from w601sxs/b1ade-embed on the sentence-transformers/wikipedia-en-sentences dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("w601sxs/b1ade-embed-distilled-from-gte-large-en-v1.5")
# Run inference
sentences = [
    'A man is walking',
    'The man is going for a walk.',
    'The station opened on 1 December 1896.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.6738
spearman_cosine 0.7347
pearson_manhattan 0.7006
spearman_manhattan 0.7089
pearson_euclidean 0.7017
spearman_euclidean 0.7102
pearson_dot 0.7336
spearman_dot 0.751
pearson_max 0.7336
spearman_max 0.751

Knowledge Distillation

Metric Value
negative_mse -21.5451

Semantic Similarity

Metric Value
pearson_cosine 0.6772
spearman_cosine 0.7311
pearson_manhattan 0.7077
spearman_manhattan 0.7121
pearson_euclidean 0.7071
spearman_euclidean 0.7115
pearson_dot 0.7026
spearman_dot 0.6949
pearson_max 0.7077
spearman_max 0.7311

Training Details

Training Dataset

sentence-transformers/wikipedia-en-sentences

  • Dataset: sentence-transformers/wikipedia-en-sentences at 4a0972d
  • Size: 200,000 training samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string list
    details
    • min: 4 tokens
    • mean: 12.24 tokens
    • max: 52 tokens
    • size: 1024 elements
  • Samples:
    sentence label
    A person on a horse jumps over a broken down airplane. [-0.5300068259239197, 0.07807248830795288, 0.304331511259079, 0.3473575711250305, 0.3993019461631775, ...]
    Children smiling and waving at camera [-0.3918086886405945, 0.514893114566803, 0.38178062438964844, -0.29475438594818115, -0.07984668761491776, ...]
    A boy is jumping on skateboard in the middle of a red bridge. [-0.7029106020927429, 0.08336036652326584, 0.7830113768577576, -0.7898964285850525, 0.27573251724243164, ...]
  • Loss: MSELoss

Evaluation Dataset

sentence-transformers/wikipedia-en-sentences

  • Dataset: sentence-transformers/wikipedia-en-sentences at 4a0972d
  • Size: 10,000 evaluation samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string list
    details
    • min: 5 tokens
    • mean: 13.23 tokens
    • max: 57 tokens
    • size: 1024 elements
  • Samples:
    sentence label
    Two women are embracing while holding to go packages. [-0.5707114338874817, -0.5041555762290955, -1.3100334405899048, 0.5848354697227478, -0.3452240526676178, ...]
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. [-0.4810343384742737, 0.034435614943504333, -0.669406533241272, -0.16233624517917633, 0.5214978456497192, ...]
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles [-0.2572114169597626, 0.19592943787574768, -0.6243088841438293, -0.4749126136302948, -0.6737443804740906, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0001
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 0.0001
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss negative_mse sts-dev_spearman_cosine sts-test_spearman_cosine
0.1279 100 0.4302 - - - -
0.2558 200 0.2398 - - - -
0.3836 300 0.1918 - - - -
0.5115 400 0.1683 - - - -
0.6394 500 0.1539 0.2155 -21.5451 0.7347 -
0.7673 600 0.1456 - - - -
0.8951 700 0.1393 - - - -
1.0 782 - - - - 0.7311
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.6
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.1
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}