rbhatia46's picture
Add new SentenceTransformer model.
ab5855e verified
|
raw
history blame
29.1 kB
metadata
base_model: BAAI/bge-base-en-v1.5
datasets: []
language:
  - en
library_name: sentence-transformers
license: apache-2.0
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:6300
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      As of December 31, 2023, deferred revenues for unsatisfied performance
      obligations consisted of $769 million related to Hilton Honors that will
      be recognized as revenue over approximately the next two years.
    sentences:
      - How many shares of common stock were issued in both 2022 and 2023?
      - >-
        What is the projected timeline for recognizing revenue from deferred
        revenues related to Hilton Honors as of December 31, 2023?
      - >-
        What acquisitions did CVS Health Corporation complete in 2023 to enhance
        their care delivery strategy?
  - source_sentence: >-
      If a good or service does not qualify as distinct, it is combined with the
      other non-distinct goods or services within the arrangement and these
      combined goods or services are treated as a single performance obligation
      for accounting purposes. The arrangement's transaction price is then
      allocated to each performance obligation based on the relative standalone
      selling price of each performance obligation.
    sentences:
      - >-
        What does the summary table indicate about the company's activities at
        the end of 2023?
      - >-
        What governs the treatment of goods or services that are not distinct
        within a contractual arrangement?
      - >-
        What is the basis for the Company to determine the Standalone Selling
        Price (SSP) for each distinct performance obligation in contracts with
        multiple performance obligations?
  - source_sentence: >-
      As of January 2023, the maximum daily borrowing capacity under the
      commercial paper program was approximately $2.75 billion.
    sentences:
      - >-
        What is the maximum daily borrowing capacity under the commercial paper
        program as of January 2023?
      - When does the Company's fiscal year end?
      - How much cash did acquisition activities use in 2023?
  - source_sentence: >-
      Federal Home Loan Bank borrowings had an interest rate of 4.59% in 2022,
      which increased to 5.14% in 2023.
    sentences:
      - >-
        By what percentage did the company's capital expenditures increase in
        fiscal 2023 compared to fiscal 2022?
      - >-
        What is the significance of Note 13 in the context of legal proceedings
        described in the Annual Report on Form 10-K?
      - >-
        How much did the Federal Home Loan Bank borrowings increase in terms of
        interest rates from 2022 to 2023?
  - source_sentence: >-
      The design of the Annual Report, with the consolidated financial
      statements placed immediately after Part IV, enhances the integration of
      financial data by maintaining a coherent structure.
    sentences:
      - >-
        How does the structure of the Annual Report on Form 10-K facilitate the
        integration of the consolidated financial statements?
      - Where can one find the Glossary of Terms and Acronyms in Item 8?
      - >-
        What part of the annual report contains the consolidated financial
        statements and accompanying notes?
model-index:
  - name: BGE base Financial Matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.6957142857142857
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8171428571428572
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8628571428571429
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6957142857142857
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2723809523809524
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.17257142857142854
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08999999999999998
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6957142857142857
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8171428571428572
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8628571428571429
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7971144469297426
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7641831065759639
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7681728985040082
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.6942857142857143
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.81
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8514285714285714
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6942857142857143
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.27
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.17028571428571426
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6942857142857143
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.81
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8514285714285714
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7951260604161544
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7617998866213151
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7658003405075238
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.7014285714285714
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7971428571428572
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.85
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8885714285714286
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7014285714285714
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.26571428571428574
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.16999999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08885714285714284
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7014285714285714
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.7971428571428572
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.85
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8885714285714286
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.793266992460996
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7629580498866213
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7678096436855835
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.6957142857142857
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8014285714285714
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8357142857142857
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8842857142857142
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6957142857142857
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2671428571428571
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.16714285714285712
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08842857142857141
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6957142857142857
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8014285714285714
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8357142857142857
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8842857142857142
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.787378246207931
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7566984126984126
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7613545312565108
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.6571428571428571
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7871428571428571
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8285714285714286
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8757142857142857
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6571428571428571
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2623809523809524
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1657142857142857
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08757142857142856
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6571428571428571
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.7871428571428571
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8285714285714286
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8757142857142857
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7655516319615892
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7303951247165531
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7349875161463472
            name: Cosine Map@100

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("rbhatia46/bge-base-financial-nvidia-matryoshka")
# Run inference
sentences = [
    'The design of the Annual Report, with the consolidated financial statements placed immediately after Part IV, enhances the integration of financial data by maintaining a coherent structure.',
    'How does the structure of the Annual Report on Form 10-K facilitate the integration of the consolidated financial statements?',
    'Where can one find the Glossary of Terms and Acronyms in Item 8?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6957
cosine_accuracy@3 0.8171
cosine_accuracy@5 0.8629
cosine_accuracy@10 0.9
cosine_precision@1 0.6957
cosine_precision@3 0.2724
cosine_precision@5 0.1726
cosine_precision@10 0.09
cosine_recall@1 0.6957
cosine_recall@3 0.8171
cosine_recall@5 0.8629
cosine_recall@10 0.9
cosine_ndcg@10 0.7971
cosine_mrr@10 0.7642
cosine_map@100 0.7682

Information Retrieval

Metric Value
cosine_accuracy@1 0.6943
cosine_accuracy@3 0.81
cosine_accuracy@5 0.8514
cosine_accuracy@10 0.9
cosine_precision@1 0.6943
cosine_precision@3 0.27
cosine_precision@5 0.1703
cosine_precision@10 0.09
cosine_recall@1 0.6943
cosine_recall@3 0.81
cosine_recall@5 0.8514
cosine_recall@10 0.9
cosine_ndcg@10 0.7951
cosine_mrr@10 0.7618
cosine_map@100 0.7658

Information Retrieval

Metric Value
cosine_accuracy@1 0.7014
cosine_accuracy@3 0.7971
cosine_accuracy@5 0.85
cosine_accuracy@10 0.8886
cosine_precision@1 0.7014
cosine_precision@3 0.2657
cosine_precision@5 0.17
cosine_precision@10 0.0889
cosine_recall@1 0.7014
cosine_recall@3 0.7971
cosine_recall@5 0.85
cosine_recall@10 0.8886
cosine_ndcg@10 0.7933
cosine_mrr@10 0.763
cosine_map@100 0.7678

Information Retrieval

Metric Value
cosine_accuracy@1 0.6957
cosine_accuracy@3 0.8014
cosine_accuracy@5 0.8357
cosine_accuracy@10 0.8843
cosine_precision@1 0.6957
cosine_precision@3 0.2671
cosine_precision@5 0.1671
cosine_precision@10 0.0884
cosine_recall@1 0.6957
cosine_recall@3 0.8014
cosine_recall@5 0.8357
cosine_recall@10 0.8843
cosine_ndcg@10 0.7874
cosine_mrr@10 0.7567
cosine_map@100 0.7614

Information Retrieval

Metric Value
cosine_accuracy@1 0.6571
cosine_accuracy@3 0.7871
cosine_accuracy@5 0.8286
cosine_accuracy@10 0.8757
cosine_precision@1 0.6571
cosine_precision@3 0.2624
cosine_precision@5 0.1657
cosine_precision@10 0.0876
cosine_recall@1 0.6571
cosine_recall@3 0.7871
cosine_recall@5 0.8286
cosine_recall@10 0.8757
cosine_ndcg@10 0.7656
cosine_mrr@10 0.7304
cosine_map@100 0.735

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,300 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 6 tokens
    • mean: 45.53 tokens
    • max: 222 tokens
    • min: 8 tokens
    • mean: 20.3 tokens
    • max: 45 tokens
  • Samples:
    positive anchor
    Acquisition activity used cash of $765 million in 2023, primarily related to a Beauty acquisition. How much cash did acquisition activities use in 2023?
    In a financial report, Part IV Item 15 includes Exhibits and Financial Statement Schedules as mentioned. What content can be expected under Part IV Item 15 in a financial report?
    we had more than 8.3 million fiber consumer wireline broadband customers, adding 1.1 million during the year. How many fiber consumer wireline broadband customers did the company have at the end of the year?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.8122 10 1.5751 - - - - -
0.9746 12 - - - - - 0.7580
0.8122 10 0.6362 - - - - -
0.9746 12 - 0.7503 0.7576 0.7653 0.7282 0.7638
1.6244 20 0.4426 - - - - -
1.9492 24 - 0.7544 0.7662 0.7640 0.7311 0.7676
2.4365 30 0.3217 - - - - -
2.9239 36 - 0.7608 0.7684 0.7662 0.7341 0.7686
3.2487 40 0.2761 - - - - -
3.8985 48 - 0.7614 0.7678 0.7658 0.735 0.7682
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.6
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}