acayir64's picture
Upload folder using huggingface_hub
3fe70ae verified
|
raw
history blame
31.8 kB
metadata
language:
  - multilingual
  - zh
  - ja
  - ar
  - ko
  - de
  - fr
  - es
  - pt
  - hi
  - id
  - it
  - tr
  - ru
  - bn
  - ur
  - mr
  - ta
  - vi
  - fa
  - pl
  - uk
  - nl
  - sv
  - he
  - sw
  - ps
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dataset_size:10K<n<100K
  - loss:CoSENTLoss
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: Bottomless Mug
    sentences:
      - You are always safe.
      - That trend isn't very known yet
      - Eleanor Clift göreve koşuyor.
  - source_sentence: Tripp has a job.
    sentences:
      - They are having money problems.
      - Malignite aniden ortaya çıkar.
      - Mezarlar derin ormanlarda saklandı.
  - source_sentence: There are rules
    sentences:
      - There are more villians than heros.
      - The directions should be read.
      - Mezarlar derin ormanlarda saklandı.
  - source_sentence: K is a musician.
    sentences:
      - Klimt draws hotdogs.
      - Ed Wood hiç mahkemeye çıkmadı.
      - Çeçen Rusya yönetimi ele geçirdi.
  - source_sentence: We moved closer.
    sentences:
      - Clinton is unaware of the process.
      - Nesil deneyimleri anlamsızdır.
      - Hormonların etkileri vardır.
pipeline_tag: sentence-similarity
model-index:
  - name: >-
      SentenceTransformer based on
      sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: tr ling
          type: tr_ling
        metrics:
          - type: pearson_cosine
            value: 0.058743115070889876
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.059526247945378225
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.04582145815494953
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.04331287037397966
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.04709170917685587
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.04407504959649961
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.08477622619519222
            name: Pearson Dot
          - type: spearman_dot
            value: 0.08243745050110735
            name: Spearman Dot
          - type: pearson_max
            value: 0.08477622619519222
            name: Pearson Max
          - type: spearman_max
            value: 0.08243745050110735
            name: Spearman Max

SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 on the MoritzLaurer/multilingual-nli-26lang-2mil7 dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'We moved closer.',
    'Clinton is unaware of the process.',
    'Nesil deneyimleri anlamsızdır.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.0587
spearman_cosine 0.0595
pearson_manhattan 0.0458
spearman_manhattan 0.0433
pearson_euclidean 0.0471
spearman_euclidean 0.0441
pearson_dot 0.0848
spearman_dot 0.0824
pearson_max 0.0848
spearman_max 0.0824

Training Details

Training Dataset

MoritzLaurer/multilingual-nli-26lang-2mil7

  • Dataset: MoritzLaurer/multilingual-nli-26lang-2mil7 at 510a233
  • Size: 25,000 training samples
  • Columns: premise_original, hypothesis_original, score, sentence1, and sentence2
  • Approximate statistics based on the first 1000 samples:
    premise_original hypothesis_original score sentence1 sentence2
    type string string int string string
    details
    • min: 4 tokens
    • mean: 29.3 tokens
    • max: 107 tokens
    • min: 4 tokens
    • mean: 15.62 tokens
    • max: 40 tokens
    • 0: ~34.50%
    • 1: ~33.30%
    • 2: ~32.20%
    • min: 4 tokens
    • mean: 28.28 tokens
    • max: 101 tokens
    • min: 4 tokens
    • mean: 15.39 tokens
    • max: 38 tokens
  • Samples:
    premise_original hypothesis_original score sentence1 sentence2
    N, the total number of LC50 values used in calculating the CV(%) varied with organism and toxicant because some data were rejected due to water hardness, lack of concentration measurements, and/or because some of the LC50s were not calculable. Most discarded data was rejected due to water hardness. 1 N, CV'nin hesaplanmasında kullanılan LC50 değerlerinin toplam sayısı (%) organizma ve toksik madde ile çeşitlidir, çünkü bazı veriler su sertliği, konsantrasyon ölçümlerinin eksikliği ve / veya LC50'lerin bazıları hesaplanamaz olduğu için reddedilmiştir. Atılan verilerin çoğu su sertliği nedeniyle reddedildi.
    As the home of the Venus de Milo and Mona Lisa, the Louvre drew almost unmanageable crowds until President Mitterrand ordered its re-organization in the 1980s. The Louvre is home of the Venus de Milo and Mona Lisa. 0 Venus de Milo ve Mona Lisa'nın evi olarak Louvre, Başkan Mitterrand'ın 1980'lerde yeniden düzenlenmesini emredene kadar neredeyse yönetilemez kalabalıklar çekti. Louvre, Venus de Milo ve Mona Lisa'nın evidir.
    A year ago, the wife of the Oxford don noticed that the pattern on Kleenex quilted tissue uncannily resembled the Penrose Arrowed Rhombi tilings pattern, which Sir Roger had invented--and copyrighted--in 1974. It has been recently found out a similarity between the pattern on the recent Kleenex quilted tissue and the one of the Penrose Arrowed Rhombi tilings. 0 Bir yıl önce Oxford'un karısı, Kleenex kapitone dokudaki desenin 1974'te Sir Roger'ın icat ettiği -ve telif hakkı olan - Penrose Arrowed Rhombi tilings desenine benzediğini fark etti. Yakın zamanda, son Kleenex kapitone dokudaki desen ile Penrose Arrowed Rhombi döşemelerinden biri arasında bir benzerlik bulunmuştur.
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

MoritzLaurer/multilingual-nli-26lang-2mil7

  • Dataset: MoritzLaurer/multilingual-nli-26lang-2mil7 at 510a233
  • Size: 5,000 evaluation samples
  • Columns: premise_original, hypothesis_original, score, sentence1, and sentence2
  • Approximate statistics based on the first 1000 samples:
    premise_original hypothesis_original score sentence1 sentence2
    type string string int string string
    details
    • min: 5 tokens
    • mean: 30.3 tokens
    • max: 99 tokens
    • min: 6 tokens
    • mean: 15.11 tokens
    • max: 56 tokens
    • 0: ~34.50%
    • 1: ~29.90%
    • 2: ~35.60%
    • min: 6 tokens
    • mean: 29.94 tokens
    • max: 106 tokens
    • min: 5 tokens
    • mean: 15.29 tokens
    • max: 52 tokens
  • Samples:
    premise_original hypothesis_original score sentence1 sentence2
    But the racism charge isn't quirky or wacky--it's demagogy. The accusation of prejudice based on a pedestrian kind of hatred. 0 Ama ırkçılık suçlaması tuhaf ya da tuhaf değil, bu bir demagoji. Yaya nefretine dayanan önyargı suçlaması.
    Why would Gates allow the publication of such a book with his byline and photo on the dust jacket? Gates' byline and photo are on the dust jacket 0 Gates neden böyle bir kitabın basılmasına izin versin ki? Gates'in çizgisi ve fotoğrafı toz ceketin üzerinde.
    I am a nonsmoker and allergic to cigarette smoke. I do not smoke. 0 Sigara içmeyen biriyim ve sigara dumanına alerjim var. Sigara içmiyorum.
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • ddp_find_unused_parameters: False

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: False
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss tr_ling_spearman_max
0.0320 25 17.17 - -
0.0639 50 16.4932 - -
0.0959 75 16.5976 - -
0.1279 100 15.6991 - -
0.1598 125 14.876 - -
0.1918 150 14.4828 - -
0.2238 175 12.7061 - -
0.2558 200 10.8687 - -
0.2877 225 8.3797 - -
0.3197 250 6.2029 - -
0.3517 275 5.8228 - -
0.3836 300 5.811 - -
0.4156 325 5.8079 - -
0.4476 350 5.8077 - -
0.4795 375 5.8035 - -
0.5115 400 5.8072 - -
0.5435 425 5.8033 - -
0.5754 450 5.8086 - -
0.6074 475 5.81 - -
0.6394 500 5.7949 - -
0.6714 525 5.8079 - -
0.7033 550 5.8057 - -
0.7353 575 5.8097 - -
0.7673 600 5.7986 - -
0.7992 625 5.8051 - -
0.8312 650 5.8041 - -
0.8632 675 5.7907 - -
0.8951 700 5.7991 - -
0.9271 725 5.8035 - -
0.9591 750 5.7945 - -
0.9910 775 5.8077 - -
1.0 782 - 5.8024 0.0330
1.0230 800 5.6703 - -
1.0550 825 5.8052 - -
1.0870 850 5.7936 - -
1.1189 875 5.7924 - -
1.1509 900 5.7806 - -
1.1829 925 5.7835 - -
1.2148 950 5.7619 - -
1.2468 975 5.8038 - -
1.2788 1000 5.779 - -
1.3107 1025 5.7904 - -
1.3427 1050 5.7696 - -
1.3747 1075 5.7919 - -
1.4066 1100 5.7785 - -
1.4386 1125 5.7862 - -
1.4706 1150 5.7703 - -
1.5026 1175 5.773 - -
1.5345 1200 5.7627 - -
1.5665 1225 5.7596 - -
1.5985 1250 5.7882 - -
1.6304 1275 5.7828 - -
1.6624 1300 5.771 - -
1.6944 1325 5.788 - -
1.7263 1350 5.7719 - -
1.7583 1375 5.7846 - -
1.7903 1400 5.7838 - -
1.8223 1425 5.7912 - -
1.8542 1450 5.7686 - -
1.8862 1475 5.7938 - -
1.9182 1500 5.7847 - -
1.9501 1525 5.7952 - -
1.9821 1550 5.7528 - -
2.0 1564 - 5.7933 0.0682
2.0141 1575 5.65 - -
2.0460 1600 5.7537 - -
2.0780 1625 5.7098 - -
2.1100 1650 5.7149 - -
2.1419 1675 5.7585 - -
2.1739 1700 5.7277 - -
2.2059 1725 5.7482 - -
2.2379 1750 5.7115 - -
2.2698 1775 5.6895 - -
2.3018 1800 5.7389 - -
2.3338 1825 5.7161 - -
2.3657 1850 5.7123 - -
2.3977 1875 5.7322 - -
2.4297 1900 5.7421 - -
2.4616 1925 5.7615 - -
2.4936 1950 5.7493 - -
2.5256 1975 5.7298 - -
2.5575 2000 5.7529 - -
2.5895 2025 5.7318 - -
2.6215 2050 5.7036 - -
2.6535 2075 5.7158 - -
2.6854 2100 5.7209 - -
2.7174 2125 5.738 - -
2.7494 2150 5.7337 - -
2.7813 2175 5.713 - -
2.8133 2200 5.7257 - -
2.8453 2225 5.6958 - -
2.8772 2250 5.7053 - -
2.9092 2275 5.7246 - -
2.9412 2300 5.7291 - -
2.9731 2325 5.7139 - -
3.0 2346 - 5.8510 0.0837
3.0051 2350 5.5715 - -
3.0371 2375 5.6558 - -
3.0691 2400 5.6441 - -
3.1010 2425 5.6569 - -
3.1330 2450 5.669 - -
3.1650 2475 5.6361 - -
3.1969 2500 5.6524 - -
3.2289 2525 5.6773 - -
3.2609 2550 5.6552 - -
3.2928 2575 5.6807 - -
3.3248 2600 5.6638 - -
3.3568 2625 5.6582 - -
3.3887 2650 5.658 - -
3.4207 2675 5.6626 - -
3.4527 2700 5.6802 - -
3.4847 2725 5.6377 - -
3.5166 2750 5.6752 - -
3.5486 2775 5.6573 - -
3.5806 2800 5.6963 - -
3.6125 2825 5.7007 - -
3.6445 2850 5.6746 - -
3.6765 2875 5.6312 - -
3.7084 2900 5.5596 - -
3.7404 2925 5.7003 - -
3.7724 2950 5.6739 - -
3.8043 2975 5.655 - -
3.8363 3000 5.6787 - -
3.8683 3025 5.643 - -
3.9003 3050 5.6412 - -
3.9322 3075 5.758 - -
3.9642 3100 5.6769 - -
3.9962 3125 5.7206 - -
4.0 3128 - 5.9125 0.0824

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}