|
--- |
|
tags: |
|
- mteb |
|
model-index: |
|
- name: checkpoint-1431 |
|
results: |
|
- task: |
|
type: STS |
|
dataset: |
|
type: C-MTEB/AFQMC |
|
name: MTEB AFQMC |
|
config: default |
|
split: validation |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 56.306314279047875 |
|
- type: cos_sim_spearman |
|
value: 61.020227685004016 |
|
- type: euclidean_pearson |
|
value: 58.61821670933433 |
|
- type: euclidean_spearman |
|
value: 60.131457106640674 |
|
- type: manhattan_pearson |
|
value: 58.6189460369694 |
|
- type: manhattan_spearman |
|
value: 60.126350618526224 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: C-MTEB/ATEC |
|
name: MTEB ATEC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 55.8612958476143 |
|
- type: cos_sim_spearman |
|
value: 59.01977664864512 |
|
- type: euclidean_pearson |
|
value: 62.028094897243655 |
|
- type: euclidean_spearman |
|
value: 58.6046814257705 |
|
- type: manhattan_pearson |
|
value: 62.02580042431887 |
|
- type: manhattan_spearman |
|
value: 58.60626890004892 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_reviews_multi |
|
name: MTEB AmazonReviewsClassification (zh) |
|
config: zh |
|
split: test |
|
revision: 1399c76144fd37290681b995c656ef9b2e06e26d |
|
metrics: |
|
- type: accuracy |
|
value: 49.496 |
|
- type: f1 |
|
value: 46.673963383873065 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: C-MTEB/BQ |
|
name: MTEB BQ |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 70.73971622592535 |
|
- type: cos_sim_spearman |
|
value: 72.76102992060764 |
|
- type: euclidean_pearson |
|
value: 71.04525865868672 |
|
- type: euclidean_spearman |
|
value: 72.4032852155075 |
|
- type: manhattan_pearson |
|
value: 71.03693009336658 |
|
- type: manhattan_spearman |
|
value: 72.39635701224252 |
|
- task: |
|
type: Clustering |
|
dataset: |
|
type: C-MTEB/CLSClusteringP2P |
|
name: MTEB CLSClusteringP2P |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: v_measure |
|
value: 56.34751074520767 |
|
- task: |
|
type: Clustering |
|
dataset: |
|
type: C-MTEB/CLSClusteringS2S |
|
name: MTEB CLSClusteringS2S |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: v_measure |
|
value: 48.4856662121073 |
|
- task: |
|
type: Reranking |
|
dataset: |
|
type: C-MTEB/CMedQAv1-reranking |
|
name: MTEB CMedQAv1 |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map |
|
value: 89.26384109024997 |
|
- type: mrr |
|
value: 91.27261904761905 |
|
- task: |
|
type: Reranking |
|
dataset: |
|
type: C-MTEB/CMedQAv2-reranking |
|
name: MTEB CMedQAv2 |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map |
|
value: 90.0464058154547 |
|
- type: mrr |
|
value: 92.06480158730159 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: C-MTEB/CmedqaRetrieval |
|
name: MTEB CmedqaRetrieval |
|
config: default |
|
split: dev |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 27.236 |
|
- type: map_at_10 |
|
value: 40.778 |
|
- type: map_at_100 |
|
value: 42.692 |
|
- type: map_at_1000 |
|
value: 42.787 |
|
- type: map_at_3 |
|
value: 36.362 |
|
- type: map_at_5 |
|
value: 38.839 |
|
- type: mrr_at_1 |
|
value: 41.335 |
|
- type: mrr_at_10 |
|
value: 49.867 |
|
- type: mrr_at_100 |
|
value: 50.812999999999995 |
|
- type: mrr_at_1000 |
|
value: 50.848000000000006 |
|
- type: mrr_at_3 |
|
value: 47.354 |
|
- type: mrr_at_5 |
|
value: 48.718 |
|
- type: ndcg_at_1 |
|
value: 41.335 |
|
- type: ndcg_at_10 |
|
value: 47.642 |
|
- type: ndcg_at_100 |
|
value: 54.855 |
|
- type: ndcg_at_1000 |
|
value: 56.449000000000005 |
|
- type: ndcg_at_3 |
|
value: 42.203 |
|
- type: ndcg_at_5 |
|
value: 44.416 |
|
- type: precision_at_1 |
|
value: 41.335 |
|
- type: precision_at_10 |
|
value: 10.568 |
|
- type: precision_at_100 |
|
value: 1.6400000000000001 |
|
- type: precision_at_1000 |
|
value: 0.184 |
|
- type: precision_at_3 |
|
value: 23.998 |
|
- type: precision_at_5 |
|
value: 17.389 |
|
- type: recall_at_1 |
|
value: 27.236 |
|
- type: recall_at_10 |
|
value: 58.80800000000001 |
|
- type: recall_at_100 |
|
value: 88.411 |
|
- type: recall_at_1000 |
|
value: 99.032 |
|
- type: recall_at_3 |
|
value: 42.253 |
|
- type: recall_at_5 |
|
value: 49.118 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: C-MTEB/CMNLI |
|
name: MTEB Cmnli |
|
config: default |
|
split: validation |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 86.03728202044498 |
|
- type: cos_sim_ap |
|
value: 92.49469583272597 |
|
- type: cos_sim_f1 |
|
value: 86.74095974528088 |
|
- type: cos_sim_precision |
|
value: 84.43657294664601 |
|
- type: cos_sim_recall |
|
value: 89.17465513210195 |
|
- type: dot_accuracy |
|
value: 72.21888153938664 |
|
- type: dot_ap |
|
value: 80.59377163340332 |
|
- type: dot_f1 |
|
value: 74.96686040583258 |
|
- type: dot_precision |
|
value: 66.4737793851718 |
|
- type: dot_recall |
|
value: 85.94809445873275 |
|
- type: euclidean_accuracy |
|
value: 85.47203848466627 |
|
- type: euclidean_ap |
|
value: 91.89152584749868 |
|
- type: euclidean_f1 |
|
value: 86.38105975197294 |
|
- type: euclidean_precision |
|
value: 83.40953625081646 |
|
- type: euclidean_recall |
|
value: 89.5721299976619 |
|
- type: manhattan_accuracy |
|
value: 85.3758268190018 |
|
- type: manhattan_ap |
|
value: 91.88989707722311 |
|
- type: manhattan_f1 |
|
value: 86.39767519839052 |
|
- type: manhattan_precision |
|
value: 82.76231263383298 |
|
- type: manhattan_recall |
|
value: 90.36707972878185 |
|
- type: max_accuracy |
|
value: 86.03728202044498 |
|
- type: max_ap |
|
value: 92.49469583272597 |
|
- type: max_f1 |
|
value: 86.74095974528088 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: C-MTEB/CovidRetrieval |
|
name: MTEB CovidRetrieval |
|
config: default |
|
split: dev |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 74.34100000000001 |
|
- type: map_at_10 |
|
value: 82.49499999999999 |
|
- type: map_at_100 |
|
value: 82.64200000000001 |
|
- type: map_at_1000 |
|
value: 82.643 |
|
- type: map_at_3 |
|
value: 81.142 |
|
- type: map_at_5 |
|
value: 81.95400000000001 |
|
- type: mrr_at_1 |
|
value: 74.71 |
|
- type: mrr_at_10 |
|
value: 82.553 |
|
- type: mrr_at_100 |
|
value: 82.699 |
|
- type: mrr_at_1000 |
|
value: 82.70100000000001 |
|
- type: mrr_at_3 |
|
value: 81.279 |
|
- type: mrr_at_5 |
|
value: 82.069 |
|
- type: ndcg_at_1 |
|
value: 74.605 |
|
- type: ndcg_at_10 |
|
value: 85.946 |
|
- type: ndcg_at_100 |
|
value: 86.607 |
|
- type: ndcg_at_1000 |
|
value: 86.669 |
|
- type: ndcg_at_3 |
|
value: 83.263 |
|
- type: ndcg_at_5 |
|
value: 84.71600000000001 |
|
- type: precision_at_1 |
|
value: 74.605 |
|
- type: precision_at_10 |
|
value: 9.758 |
|
- type: precision_at_100 |
|
value: 1.005 |
|
- type: precision_at_1000 |
|
value: 0.101 |
|
- type: precision_at_3 |
|
value: 29.996000000000002 |
|
- type: precision_at_5 |
|
value: 18.736 |
|
- type: recall_at_1 |
|
value: 74.34100000000001 |
|
- type: recall_at_10 |
|
value: 96.523 |
|
- type: recall_at_100 |
|
value: 99.473 |
|
- type: recall_at_1000 |
|
value: 100.0 |
|
- type: recall_at_3 |
|
value: 89.278 |
|
- type: recall_at_5 |
|
value: 92.83500000000001 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: C-MTEB/DuRetrieval |
|
name: MTEB DuRetrieval |
|
config: default |
|
split: dev |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 26.950000000000003 |
|
- type: map_at_10 |
|
value: 82.408 |
|
- type: map_at_100 |
|
value: 85.057 |
|
- type: map_at_1000 |
|
value: 85.09100000000001 |
|
- type: map_at_3 |
|
value: 57.635999999999996 |
|
- type: map_at_5 |
|
value: 72.48 |
|
- type: mrr_at_1 |
|
value: 92.15 |
|
- type: mrr_at_10 |
|
value: 94.554 |
|
- type: mrr_at_100 |
|
value: 94.608 |
|
- type: mrr_at_1000 |
|
value: 94.61 |
|
- type: mrr_at_3 |
|
value: 94.292 |
|
- type: mrr_at_5 |
|
value: 94.459 |
|
- type: ndcg_at_1 |
|
value: 92.15 |
|
- type: ndcg_at_10 |
|
value: 89.108 |
|
- type: ndcg_at_100 |
|
value: 91.525 |
|
- type: ndcg_at_1000 |
|
value: 91.82900000000001 |
|
- type: ndcg_at_3 |
|
value: 88.44 |
|
- type: ndcg_at_5 |
|
value: 87.271 |
|
- type: precision_at_1 |
|
value: 92.15 |
|
- type: precision_at_10 |
|
value: 42.29 |
|
- type: precision_at_100 |
|
value: 4.812 |
|
- type: precision_at_1000 |
|
value: 0.48900000000000005 |
|
- type: precision_at_3 |
|
value: 79.14999999999999 |
|
- type: precision_at_5 |
|
value: 66.64 |
|
- type: recall_at_1 |
|
value: 26.950000000000003 |
|
- type: recall_at_10 |
|
value: 89.832 |
|
- type: recall_at_100 |
|
value: 97.921 |
|
- type: recall_at_1000 |
|
value: 99.471 |
|
- type: recall_at_3 |
|
value: 59.562000000000005 |
|
- type: recall_at_5 |
|
value: 76.533 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: C-MTEB/EcomRetrieval |
|
name: MTEB EcomRetrieval |
|
config: default |
|
split: dev |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 53.5 |
|
- type: map_at_10 |
|
value: 63.105999999999995 |
|
- type: map_at_100 |
|
value: 63.63100000000001 |
|
- type: map_at_1000 |
|
value: 63.641999999999996 |
|
- type: map_at_3 |
|
value: 60.617 |
|
- type: map_at_5 |
|
value: 62.132 |
|
- type: mrr_at_1 |
|
value: 53.5 |
|
- type: mrr_at_10 |
|
value: 63.105999999999995 |
|
- type: mrr_at_100 |
|
value: 63.63100000000001 |
|
- type: mrr_at_1000 |
|
value: 63.641999999999996 |
|
- type: mrr_at_3 |
|
value: 60.617 |
|
- type: mrr_at_5 |
|
value: 62.132 |
|
- type: ndcg_at_1 |
|
value: 53.5 |
|
- type: ndcg_at_10 |
|
value: 67.92200000000001 |
|
- type: ndcg_at_100 |
|
value: 70.486 |
|
- type: ndcg_at_1000 |
|
value: 70.777 |
|
- type: ndcg_at_3 |
|
value: 62.853 |
|
- type: ndcg_at_5 |
|
value: 65.59899999999999 |
|
- type: precision_at_1 |
|
value: 53.5 |
|
- type: precision_at_10 |
|
value: 8.309999999999999 |
|
- type: precision_at_100 |
|
value: 0.951 |
|
- type: precision_at_1000 |
|
value: 0.097 |
|
- type: precision_at_3 |
|
value: 23.1 |
|
- type: precision_at_5 |
|
value: 15.2 |
|
- type: recall_at_1 |
|
value: 53.5 |
|
- type: recall_at_10 |
|
value: 83.1 |
|
- type: recall_at_100 |
|
value: 95.1 |
|
- type: recall_at_1000 |
|
value: 97.39999999999999 |
|
- type: recall_at_3 |
|
value: 69.3 |
|
- type: recall_at_5 |
|
value: 76.0 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: C-MTEB/IFlyTek-classification |
|
name: MTEB IFlyTek |
|
config: default |
|
split: validation |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 51.773759138130046 |
|
- type: f1 |
|
value: 40.38600802756481 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: C-MTEB/JDReview-classification |
|
name: MTEB JDReview |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 88.48030018761726 |
|
- type: ap |
|
value: 59.2732541555627 |
|
- type: f1 |
|
value: 83.58836007358619 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: C-MTEB/LCQMC |
|
name: MTEB LCQMC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 73.67511194245922 |
|
- type: cos_sim_spearman |
|
value: 79.43347759067298 |
|
- type: euclidean_pearson |
|
value: 79.04491504318766 |
|
- type: euclidean_spearman |
|
value: 79.14478545356785 |
|
- type: manhattan_pearson |
|
value: 79.03365022867428 |
|
- type: manhattan_spearman |
|
value: 79.13172717619908 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: C-MTEB/MMarcoRetrieval |
|
name: MTEB MMarcoRetrieval |
|
config: default |
|
split: dev |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 67.184 |
|
- type: map_at_10 |
|
value: 76.24600000000001 |
|
- type: map_at_100 |
|
value: 76.563 |
|
- type: map_at_1000 |
|
value: 76.575 |
|
- type: map_at_3 |
|
value: 74.522 |
|
- type: map_at_5 |
|
value: 75.598 |
|
- type: mrr_at_1 |
|
value: 69.47 |
|
- type: mrr_at_10 |
|
value: 76.8 |
|
- type: mrr_at_100 |
|
value: 77.082 |
|
- type: mrr_at_1000 |
|
value: 77.093 |
|
- type: mrr_at_3 |
|
value: 75.29400000000001 |
|
- type: mrr_at_5 |
|
value: 76.24 |
|
- type: ndcg_at_1 |
|
value: 69.47 |
|
- type: ndcg_at_10 |
|
value: 79.81099999999999 |
|
- type: ndcg_at_100 |
|
value: 81.187 |
|
- type: ndcg_at_1000 |
|
value: 81.492 |
|
- type: ndcg_at_3 |
|
value: 76.536 |
|
- type: ndcg_at_5 |
|
value: 78.367 |
|
- type: precision_at_1 |
|
value: 69.47 |
|
- type: precision_at_10 |
|
value: 9.599 |
|
- type: precision_at_100 |
|
value: 1.026 |
|
- type: precision_at_1000 |
|
value: 0.105 |
|
- type: precision_at_3 |
|
value: 28.777 |
|
- type: precision_at_5 |
|
value: 18.232 |
|
- type: recall_at_1 |
|
value: 67.184 |
|
- type: recall_at_10 |
|
value: 90.211 |
|
- type: recall_at_100 |
|
value: 96.322 |
|
- type: recall_at_1000 |
|
value: 98.699 |
|
- type: recall_at_3 |
|
value: 81.556 |
|
- type: recall_at_5 |
|
value: 85.931 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_intent |
|
name: MTEB MassiveIntentClassification (zh-CN) |
|
config: zh-CN |
|
split: test |
|
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 |
|
metrics: |
|
- type: accuracy |
|
value: 76.96032279757901 |
|
- type: f1 |
|
value: 73.48052314033545 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_scenario |
|
name: MTEB MassiveScenarioClassification (zh-CN) |
|
config: zh-CN |
|
split: test |
|
revision: 7d571f92784cd94a019292a1f45445077d0ef634 |
|
metrics: |
|
- type: accuracy |
|
value: 84.64357767316744 |
|
- type: f1 |
|
value: 83.58250539497922 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: C-MTEB/MedicalRetrieval |
|
name: MTEB MedicalRetrieval |
|
config: default |
|
split: dev |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 56.00000000000001 |
|
- type: map_at_10 |
|
value: 62.066 |
|
- type: map_at_100 |
|
value: 62.553000000000004 |
|
- type: map_at_1000 |
|
value: 62.598 |
|
- type: map_at_3 |
|
value: 60.4 |
|
- type: map_at_5 |
|
value: 61.370000000000005 |
|
- type: mrr_at_1 |
|
value: 56.2 |
|
- type: mrr_at_10 |
|
value: 62.166 |
|
- type: mrr_at_100 |
|
value: 62.653000000000006 |
|
- type: mrr_at_1000 |
|
value: 62.699000000000005 |
|
- type: mrr_at_3 |
|
value: 60.5 |
|
- type: mrr_at_5 |
|
value: 61.47 |
|
- type: ndcg_at_1 |
|
value: 56.00000000000001 |
|
- type: ndcg_at_10 |
|
value: 65.199 |
|
- type: ndcg_at_100 |
|
value: 67.79899999999999 |
|
- type: ndcg_at_1000 |
|
value: 69.056 |
|
- type: ndcg_at_3 |
|
value: 61.814 |
|
- type: ndcg_at_5 |
|
value: 63.553000000000004 |
|
- type: precision_at_1 |
|
value: 56.00000000000001 |
|
- type: precision_at_10 |
|
value: 7.51 |
|
- type: precision_at_100 |
|
value: 0.878 |
|
- type: precision_at_1000 |
|
value: 0.098 |
|
- type: precision_at_3 |
|
value: 21.967 |
|
- type: precision_at_5 |
|
value: 14.02 |
|
- type: recall_at_1 |
|
value: 56.00000000000001 |
|
- type: recall_at_10 |
|
value: 75.1 |
|
- type: recall_at_100 |
|
value: 87.8 |
|
- type: recall_at_1000 |
|
value: 97.7 |
|
- type: recall_at_3 |
|
value: 65.9 |
|
- type: recall_at_5 |
|
value: 70.1 |
|
- task: |
|
type: Reranking |
|
dataset: |
|
type: C-MTEB/Mmarco-reranking |
|
name: MTEB MMarcoReranking |
|
config: default |
|
split: dev |
|
revision: None |
|
metrics: |
|
- type: map |
|
value: 32.74158258279793 |
|
- type: mrr |
|
value: 31.56071428571428 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: C-MTEB/MultilingualSentiment-classification |
|
name: MTEB MultilingualSentiment |
|
config: default |
|
split: validation |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 78.96666666666667 |
|
- type: f1 |
|
value: 78.82528563818045 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: C-MTEB/OCNLI |
|
name: MTEB Ocnli |
|
config: default |
|
split: validation |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 83.54087709799674 |
|
- type: cos_sim_ap |
|
value: 87.26170197077586 |
|
- type: cos_sim_f1 |
|
value: 84.7609561752988 |
|
- type: cos_sim_precision |
|
value: 80.20735155513667 |
|
- type: cos_sim_recall |
|
value: 89.86272439281943 |
|
- type: dot_accuracy |
|
value: 72.22523010286952 |
|
- type: dot_ap |
|
value: 79.51975358187732 |
|
- type: dot_f1 |
|
value: 76.32183908045977 |
|
- type: dot_precision |
|
value: 67.58957654723126 |
|
- type: dot_recall |
|
value: 87.64519535374869 |
|
- type: euclidean_accuracy |
|
value: 82.0249052517596 |
|
- type: euclidean_ap |
|
value: 85.32829948726406 |
|
- type: euclidean_f1 |
|
value: 83.24924318869829 |
|
- type: euclidean_precision |
|
value: 79.71014492753623 |
|
- type: euclidean_recall |
|
value: 87.11721224920802 |
|
- type: manhattan_accuracy |
|
value: 82.13318895506227 |
|
- type: manhattan_ap |
|
value: 85.28856869288006 |
|
- type: manhattan_f1 |
|
value: 83.34946757018393 |
|
- type: manhattan_precision |
|
value: 76.94369973190348 |
|
- type: manhattan_recall |
|
value: 90.91869060190075 |
|
- type: max_accuracy |
|
value: 83.54087709799674 |
|
- type: max_ap |
|
value: 87.26170197077586 |
|
- type: max_f1 |
|
value: 84.7609561752988 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: C-MTEB/OnlineShopping-classification |
|
name: MTEB OnlineShopping |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 94.56 |
|
- type: ap |
|
value: 92.80848436710805 |
|
- type: f1 |
|
value: 94.54951966576111 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: C-MTEB/PAWSX |
|
name: MTEB PAWSX |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 39.0866558287863 |
|
- type: cos_sim_spearman |
|
value: 45.9211126233312 |
|
- type: euclidean_pearson |
|
value: 44.86568743222145 |
|
- type: euclidean_spearman |
|
value: 45.63882757207507 |
|
- type: manhattan_pearson |
|
value: 44.89480036909126 |
|
- type: manhattan_spearman |
|
value: 45.65929449046206 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: C-MTEB/QBQTC |
|
name: MTEB QBQTC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 43.04701793979569 |
|
- type: cos_sim_spearman |
|
value: 44.87491033760315 |
|
- type: euclidean_pearson |
|
value: 36.2004061032567 |
|
- type: euclidean_spearman |
|
value: 41.44823909683865 |
|
- type: manhattan_pearson |
|
value: 36.136113427955095 |
|
- type: manhattan_spearman |
|
value: 41.39225495993949 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: mteb/sts22-crosslingual-sts |
|
name: MTEB STS22 (zh) |
|
config: zh |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 61.65611315777857 |
|
- type: cos_sim_spearman |
|
value: 64.4067673105648 |
|
- type: euclidean_pearson |
|
value: 61.814977248797184 |
|
- type: euclidean_spearman |
|
value: 63.99473350700169 |
|
- type: manhattan_pearson |
|
value: 61.684304629588624 |
|
- type: manhattan_spearman |
|
value: 63.97831213239316 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: C-MTEB/STSB |
|
name: MTEB STSB |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 76.57324933064379 |
|
- type: cos_sim_spearman |
|
value: 79.23602286949782 |
|
- type: euclidean_pearson |
|
value: 80.28226284310948 |
|
- type: euclidean_spearman |
|
value: 80.32210477608423 |
|
- type: manhattan_pearson |
|
value: 80.27262188617811 |
|
- type: manhattan_spearman |
|
value: 80.31619185039723 |
|
- task: |
|
type: Reranking |
|
dataset: |
|
type: C-MTEB/T2Reranking |
|
name: MTEB T2Reranking |
|
config: default |
|
split: dev |
|
revision: None |
|
metrics: |
|
- type: map |
|
value: 67.05266891356277 |
|
- type: mrr |
|
value: 77.1906333623497 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: C-MTEB/T2Retrieval |
|
name: MTEB T2Retrieval |
|
config: default |
|
split: dev |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 28.212 |
|
- type: map_at_10 |
|
value: 78.932 |
|
- type: map_at_100 |
|
value: 82.51899999999999 |
|
- type: map_at_1000 |
|
value: 82.575 |
|
- type: map_at_3 |
|
value: 55.614 |
|
- type: map_at_5 |
|
value: 68.304 |
|
- type: mrr_at_1 |
|
value: 91.211 |
|
- type: mrr_at_10 |
|
value: 93.589 |
|
- type: mrr_at_100 |
|
value: 93.659 |
|
- type: mrr_at_1000 |
|
value: 93.662 |
|
- type: mrr_at_3 |
|
value: 93.218 |
|
- type: mrr_at_5 |
|
value: 93.453 |
|
- type: ndcg_at_1 |
|
value: 91.211 |
|
- type: ndcg_at_10 |
|
value: 86.24000000000001 |
|
- type: ndcg_at_100 |
|
value: 89.614 |
|
- type: ndcg_at_1000 |
|
value: 90.14 |
|
- type: ndcg_at_3 |
|
value: 87.589 |
|
- type: ndcg_at_5 |
|
value: 86.265 |
|
- type: precision_at_1 |
|
value: 91.211 |
|
- type: precision_at_10 |
|
value: 42.626 |
|
- type: precision_at_100 |
|
value: 5.043 |
|
- type: precision_at_1000 |
|
value: 0.517 |
|
- type: precision_at_3 |
|
value: 76.42 |
|
- type: precision_at_5 |
|
value: 64.045 |
|
- type: recall_at_1 |
|
value: 28.212 |
|
- type: recall_at_10 |
|
value: 85.223 |
|
- type: recall_at_100 |
|
value: 96.229 |
|
- type: recall_at_1000 |
|
value: 98.849 |
|
- type: recall_at_3 |
|
value: 57.30800000000001 |
|
- type: recall_at_5 |
|
value: 71.661 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: C-MTEB/TNews-classification |
|
name: MTEB TNews |
|
config: default |
|
split: validation |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 54.385000000000005 |
|
- type: f1 |
|
value: 52.38762400903556 |
|
- task: |
|
type: Clustering |
|
dataset: |
|
type: C-MTEB/ThuNewsClusteringP2P |
|
name: MTEB ThuNewsClusteringP2P |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: v_measure |
|
value: 74.55283855942916 |
|
- task: |
|
type: Clustering |
|
dataset: |
|
type: C-MTEB/ThuNewsClusteringS2S |
|
name: MTEB ThuNewsClusteringS2S |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: v_measure |
|
value: 68.55115316700493 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: C-MTEB/VideoRetrieval |
|
name: MTEB VideoRetrieval |
|
config: default |
|
split: dev |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 58.8 |
|
- type: map_at_10 |
|
value: 69.035 |
|
- type: map_at_100 |
|
value: 69.52000000000001 |
|
- type: map_at_1000 |
|
value: 69.529 |
|
- type: map_at_3 |
|
value: 67.417 |
|
- type: map_at_5 |
|
value: 68.407 |
|
- type: mrr_at_1 |
|
value: 58.8 |
|
- type: mrr_at_10 |
|
value: 69.035 |
|
- type: mrr_at_100 |
|
value: 69.52000000000001 |
|
- type: mrr_at_1000 |
|
value: 69.529 |
|
- type: mrr_at_3 |
|
value: 67.417 |
|
- type: mrr_at_5 |
|
value: 68.407 |
|
- type: ndcg_at_1 |
|
value: 58.8 |
|
- type: ndcg_at_10 |
|
value: 73.395 |
|
- type: ndcg_at_100 |
|
value: 75.62 |
|
- type: ndcg_at_1000 |
|
value: 75.90299999999999 |
|
- type: ndcg_at_3 |
|
value: 70.11800000000001 |
|
- type: ndcg_at_5 |
|
value: 71.87400000000001 |
|
- type: precision_at_1 |
|
value: 58.8 |
|
- type: precision_at_10 |
|
value: 8.68 |
|
- type: precision_at_100 |
|
value: 0.9690000000000001 |
|
- type: precision_at_1000 |
|
value: 0.099 |
|
- type: precision_at_3 |
|
value: 25.967000000000002 |
|
- type: precision_at_5 |
|
value: 16.42 |
|
- type: recall_at_1 |
|
value: 58.8 |
|
- type: recall_at_10 |
|
value: 86.8 |
|
- type: recall_at_100 |
|
value: 96.89999999999999 |
|
- type: recall_at_1000 |
|
value: 99.2 |
|
- type: recall_at_3 |
|
value: 77.9 |
|
- type: recall_at_5 |
|
value: 82.1 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: C-MTEB/waimai-classification |
|
name: MTEB Waimai |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 89.42 |
|
- type: ap |
|
value: 75.35978503182068 |
|
- type: f1 |
|
value: 88.01006394348263 |
|
--- |
|
|
|
|
|
## Yinka |
|
|
|
Yinka embedding 模型是在开源模型[stella-v3.5-mrl](https://huggingface.co/infgrad/stella-mrl-large-zh-v3.5-1792d)上续训的,采用了[piccolo2](https://huggingface.co/sensenova/piccolo-large-zh-v2)提到的多任务混合损失(multi-task hybrid loss training)。同样本模型也支持了可变的向量维度。 |
|
|
|
## 使用方法 |
|
该模型的使用方法同[stella-v3.5-mrl](https://huggingface.co/infgrad/stella-mrl-large-zh-v3.5-1792d)一样, 无需任何前缀。 |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
from sklearn.preprocessing import normalize |
|
|
|
model = SentenceTransformer("Classical/Yinka") |
|
# 注意先不要normalize! 选取前n维后再normalize |
|
vectors = model.encode(["text1", "text2"], normalize_embeddings=False) |
|
print(vectors.shape) # shape is [2,1792] |
|
n_dims = 768 |
|
cut_vecs = normalize(vectors[:, :n_dims]) |
|
``` |
|
## 结果 |
|
| Model Name | Model Size (GB) | Dimension | Sequence Length | Classification (9) | Clustering (4) | Pair Classification (2) | Reranking (4) | Retrieval (8) | STS (8) | Average (35) | |
|
|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |
|
| [Yinka](https://huggingface.co/Classical/Yinka) | 1.21 | 1792 | 512 | 74.30 | 61.99 | 89.87 | 69.77 | 74.40 | 63.30 | 70.79 | |
|
| [stella-v3.5-mrl](https://huggingface.co/infgrad/stella-mrl-large-zh-v3.5-1792d) |1.21 | 1792 | 512 | 71.56 | 54.39 | 88.09 | 68.45 | 73.51 | 62.48 | 68.56 | |
|
| [piccolo-large-zh-v2](https://huggingface.co/sensenova/piccolo-large-zh-v2) | 1.21 | 1792 | 512 | 74.59 | 62.17 | 90.24 | 70 | 74.36 | 63.5 | 70.95 | |
|
|
|
|
|
|
|
## 训练细节 |
|
TODO |
|
|
|
## Licence |
|
本模型采用MIT licence. |