--- base_model: BAAI/bge-small-en datasets: [] language: [] library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 - dot_accuracy@1 - dot_accuracy@3 - dot_accuracy@5 - dot_accuracy@10 - dot_precision@1 - dot_precision@3 - dot_precision@5 - dot_precision@10 - dot_recall@1 - dot_recall@3 - dot_recall@5 - dot_recall@10 - dot_ndcg@10 - dot_mrr@10 - dot_map@100 pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:1010 - loss:MultipleNegativesRankingLoss widget: - source_sentence: How does Prompt-RAG differ from traditional vector embedding-based methodologies? sentences: - Prompt-RAG differs from traditional vector embedding-based methodologies by adopting a more direct and flexible retrieval process based on natural language prompts, eliminating the need for a vector database or an algorithm for indexing and selecting vectors. - By introducing a pre-aligned phrase prior to the standard SFT stage, LLMs are guided to concentrate on the aligned knowledge, thereby unlocking their internal alignment abilities and improving their performance. - The accuracy of GPT 3.5 on 2500 overall TeleQnA questions related to 3GPP documents is 60.1, while the accuracy of GPT 3.5 + Telco-RAG is 6.9 points higher. - source_sentence: Explain the concept of in-context learning as described in the paper 'An explanation of in-context learning as implicit Bayesian inference'. sentences: - The main theme of the paper is that language models can learn to perform many tasks in a zero-shot setting, without any explicit supervision. - In-context learning, as explained in the paper, is a process where a language model uses the context provided in the input to make predictions or generate outputs without explicit training on the specific task. The paper argues that this process can be understood as an implicit form of Bayesian inference. - The paper was presented in the 55th Annual Meeting of the Association for Computational Linguistics. - source_sentence: What is the purpose of the survey conducted by Huang et al. (2023)? sentences: - The purpose of the survey conducted by Huang et al. (2023) is to provide a comprehensive overview of hallucination in large language models, including its principles, taxonomy, challenges, and open questions. - The study of Human and American Translation Learning contributes to language development by understanding the cognitive processes involved in translating between languages, which can lead to improved teaching methods and translation technology. - Using profile data, triplet examples are constructed in the format of (š‘„š‘–, š‘„ š‘–āˆ’, š‘„ š‘–+). The anchor example š‘„š‘– is constructed as the combination of the content š‘š‘– and the corresponding label š‘™š‘–. - source_sentence: Who is the first author of the paper and what is their last name? sentences: - The key findings are that Vul-RAG achieves the highest accuracy and pairwise accuracy among all baselines, substantially outperforming the best baseline LLMAO. It also achieves the best trade-off between recall and precision. - The first author of the paper is Nandan Thakur. Their last name is Thakur. - The paper was presented at the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). - source_sentence: Compare the top-5 retrieval accuracy of BM25 + MQ and SERM + BF for the NQ Dataset and HotpotQA. sentences: - For the NQ Dataset, SERM + BF has a top-5 retrieval accuracy of 88.22, which is significantly higher than BM25 + MQ's accuracy of 25.19. For HotpotQA, SERM + BF was not tested, but BM25 + MQ has a top-5 retrieval accuracy of 49.52. - The paper was presented at the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. - The proof for Equation 5 progresses from Equation 20 to Equation 22 by applying the transformation motivated by Xie et al. [2021] and introducing the term p(R, x1:iāˆ’1|z) to the equation. model-index: - name: SentenceTransformer based on BAAI/bge-small-en results: - task: type: information-retrieval name: Information Retrieval dataset: name: Unknown type: unknown metrics: - type: cosine_accuracy@1 value: 0.01782178217821782 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.04356435643564356 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.06534653465346535 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.12475247524752475 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.01782178217821782 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.015841584158415842 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.016039603960396043 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.015841584158415842 name: Cosine Precision@10 - type: cosine_recall@1 value: 1.839902956558168e-05 name: Cosine Recall@1 - type: cosine_recall@3 value: 4.498766525563503e-05 name: Cosine Recall@3 - type: cosine_recall@5 value: 7.262670252004521e-05 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.00015079859335392304 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.016300874257683427 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.04234598459845988 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.0018766020656866668 name: Cosine Map@100 - type: dot_accuracy@1 value: 0.01782178217821782 name: Dot Accuracy@1 - type: dot_accuracy@3 value: 0.04356435643564356 name: Dot Accuracy@3 - type: dot_accuracy@5 value: 0.06534653465346535 name: Dot Accuracy@5 - type: dot_accuracy@10 value: 0.12475247524752475 name: Dot Accuracy@10 - type: dot_precision@1 value: 0.01782178217821782 name: Dot Precision@1 - type: dot_precision@3 value: 0.015841584158415842 name: Dot Precision@3 - type: dot_precision@5 value: 0.016039603960396043 name: Dot Precision@5 - type: dot_precision@10 value: 0.015841584158415842 name: Dot Precision@10 - type: dot_recall@1 value: 1.839902956558168e-05 name: Dot Recall@1 - type: dot_recall@3 value: 4.498766525563503e-05 name: Dot Recall@3 - type: dot_recall@5 value: 7.262670252004521e-05 name: Dot Recall@5 - type: dot_recall@10 value: 0.00015079859335392304 name: Dot Recall@10 - type: dot_ndcg@10 value: 0.016300874257683427 name: Dot Ndcg@10 - type: dot_mrr@10 value: 0.04234598459845988 name: Dot Mrr@10 - type: dot_map@100 value: 0.0018766020656866668 name: Dot Map@100 - type: cosine_accuracy@1 value: 0.019801980198019802 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.040594059405940595 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.06534653465346535 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.12673267326732673 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.019801980198019802 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.01485148514851485 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.014851485148514853 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.016831683168316833 name: Cosine Precision@10 - type: cosine_recall@1 value: 1.9670857914229207e-05 name: Cosine Recall@1 - type: cosine_recall@3 value: 3.554268094376118e-05 name: Cosine Recall@3 - type: cosine_recall@5 value: 6.67664165823309e-05 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.0001670844654494185 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.01679069935920913 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.04252396668238257 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.002057887757857092 name: Cosine Map@100 - type: dot_accuracy@1 value: 0.019801980198019802 name: Dot Accuracy@1 - type: dot_accuracy@3 value: 0.040594059405940595 name: Dot Accuracy@3 - type: dot_accuracy@5 value: 0.06534653465346535 name: Dot Accuracy@5 - type: dot_accuracy@10 value: 0.12673267326732673 name: Dot Accuracy@10 - type: dot_precision@1 value: 0.019801980198019802 name: Dot Precision@1 - type: dot_precision@3 value: 0.01485148514851485 name: Dot Precision@3 - type: dot_precision@5 value: 0.014851485148514853 name: Dot Precision@5 - type: dot_precision@10 value: 0.016831683168316833 name: Dot Precision@10 - type: dot_recall@1 value: 1.9670857914229207e-05 name: Dot Recall@1 - type: dot_recall@3 value: 3.554268094376118e-05 name: Dot Recall@3 - type: dot_recall@5 value: 6.67664165823309e-05 name: Dot Recall@5 - type: dot_recall@10 value: 0.0001670844654494185 name: Dot Recall@10 - type: dot_ndcg@10 value: 0.01679069935920913 name: Dot Ndcg@10 - type: dot_mrr@10 value: 0.04252396668238257 name: Dot Mrr@10 - type: dot_map@100 value: 0.002057887757857092 name: Dot Map@100 - type: cosine_accuracy@1 value: 0.01881188118811881 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.03762376237623762 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.06435643564356436 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.1306930693069307 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.01881188118811881 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.013861386138613862 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.015841584158415842 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.01722772277227723 name: Cosine Precision@10 - type: cosine_recall@1 value: 1.8836739119030395e-05 name: Cosine Recall@1 - type: cosine_recall@3 value: 3.852282962664283e-05 name: Cosine Recall@3 - type: cosine_recall@5 value: 7.907232140954174e-05 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.00018073758516299118 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.01704492626324548 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.04188786735816444 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.002251865468050825 name: Cosine Map@100 - type: dot_accuracy@1 value: 0.01881188118811881 name: Dot Accuracy@1 - type: dot_accuracy@3 value: 0.03762376237623762 name: Dot Accuracy@3 - type: dot_accuracy@5 value: 0.06435643564356436 name: Dot Accuracy@5 - type: dot_accuracy@10 value: 0.1306930693069307 name: Dot Accuracy@10 - type: dot_precision@1 value: 0.01881188118811881 name: Dot Precision@1 - type: dot_precision@3 value: 0.013861386138613862 name: Dot Precision@3 - type: dot_precision@5 value: 0.015841584158415842 name: Dot Precision@5 - type: dot_precision@10 value: 0.01722772277227723 name: Dot Precision@10 - type: dot_recall@1 value: 1.8836739119030395e-05 name: Dot Recall@1 - type: dot_recall@3 value: 3.852282962664283e-05 name: Dot Recall@3 - type: dot_recall@5 value: 7.907232140954174e-05 name: Dot Recall@5 - type: dot_recall@10 value: 0.00018073758516299118 name: Dot Recall@10 - type: dot_ndcg@10 value: 0.01704492626324548 name: Dot Ndcg@10 - type: dot_mrr@10 value: 0.04188786735816444 name: Dot Mrr@10 - type: dot_map@100 value: 0.002251865468050825 name: Dot Map@100 - type: cosine_accuracy@1 value: 0.01881188118811881 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.03663366336633663 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.06435643564356436 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.1306930693069307 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.01881188118811881 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.013531353135313529 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.015643564356435644 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.01722772277227723 name: Cosine Precision@10 - type: cosine_recall@1 value: 1.8836739119030395e-05 name: Cosine Recall@1 - type: cosine_recall@3 value: 3.715905688573237e-05 name: Cosine Recall@3 - type: cosine_recall@5 value: 7.929088142504806e-05 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.0001757722267344924 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.01701867523723249 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.0418477919220494 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.0022453604762727357 name: Cosine Map@100 - type: dot_accuracy@1 value: 0.01881188118811881 name: Dot Accuracy@1 - type: dot_accuracy@3 value: 0.03663366336633663 name: Dot Accuracy@3 - type: dot_accuracy@5 value: 0.06435643564356436 name: Dot Accuracy@5 - type: dot_accuracy@10 value: 0.1306930693069307 name: Dot Accuracy@10 - type: dot_precision@1 value: 0.01881188118811881 name: Dot Precision@1 - type: dot_precision@3 value: 0.013531353135313529 name: Dot Precision@3 - type: dot_precision@5 value: 0.015643564356435644 name: Dot Precision@5 - type: dot_precision@10 value: 0.01722772277227723 name: Dot Precision@10 - type: dot_recall@1 value: 1.8836739119030395e-05 name: Dot Recall@1 - type: dot_recall@3 value: 3.715905688573237e-05 name: Dot Recall@3 - type: dot_recall@5 value: 7.929088142504806e-05 name: Dot Recall@5 - type: dot_recall@10 value: 0.0001757722267344924 name: Dot Recall@10 - type: dot_ndcg@10 value: 0.01701867523723249 name: Dot Ndcg@10 - type: dot_mrr@10 value: 0.0418477919220494 name: Dot Mrr@10 - type: dot_map@100 value: 0.0022453604762727357 name: Dot Map@100 --- # SentenceTransformer based on BAAI/bge-small-en This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 384 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the šŸ¤— Hub model = SentenceTransformer("Areeb-02/bge-small-en-MultiplrRankingLoss-30-Rag-paper-dataset") # Run inference sentences = [ 'Compare the top-5 retrieval accuracy of BM25 + MQ and SERM + BF for the NQ Dataset and HotpotQA.', "For the NQ Dataset, SERM + BF has a top-5 retrieval accuracy of 88.22, which is significantly higher than BM25 + MQ's accuracy of 25.19. For HotpotQA, SERM + BF was not tested, but BM25 + MQ has a top-5 retrieval accuracy of 49.52.", 'The proof for Equation 5 progresses from Equation 20 to Equation 22 by applying the transformation motivated by Xie et al. [2021] and introducing the term p(R, x1:iāˆ’1|z) to the equation.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 384] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.0178 | | cosine_accuracy@3 | 0.0436 | | cosine_accuracy@5 | 0.0653 | | cosine_accuracy@10 | 0.1248 | | cosine_precision@1 | 0.0178 | | cosine_precision@3 | 0.0158 | | cosine_precision@5 | 0.016 | | cosine_precision@10 | 0.0158 | | cosine_recall@1 | 0.0 | | cosine_recall@3 | 0.0 | | cosine_recall@5 | 0.0001 | | cosine_recall@10 | 0.0002 | | cosine_ndcg@10 | 0.0163 | | cosine_mrr@10 | 0.0423 | | **cosine_map@100** | **0.0019** | | dot_accuracy@1 | 0.0178 | | dot_accuracy@3 | 0.0436 | | dot_accuracy@5 | 0.0653 | | dot_accuracy@10 | 0.1248 | | dot_precision@1 | 0.0178 | | dot_precision@3 | 0.0158 | | dot_precision@5 | 0.016 | | dot_precision@10 | 0.0158 | | dot_recall@1 | 0.0 | | dot_recall@3 | 0.0 | | dot_recall@5 | 0.0001 | | dot_recall@10 | 0.0002 | | dot_ndcg@10 | 0.0163 | | dot_mrr@10 | 0.0423 | | dot_map@100 | 0.0019 | #### Information Retrieval * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.0198 | | cosine_accuracy@3 | 0.0406 | | cosine_accuracy@5 | 0.0653 | | cosine_accuracy@10 | 0.1267 | | cosine_precision@1 | 0.0198 | | cosine_precision@3 | 0.0149 | | cosine_precision@5 | 0.0149 | | cosine_precision@10 | 0.0168 | | cosine_recall@1 | 0.0 | | cosine_recall@3 | 0.0 | | cosine_recall@5 | 0.0001 | | cosine_recall@10 | 0.0002 | | cosine_ndcg@10 | 0.0168 | | cosine_mrr@10 | 0.0425 | | **cosine_map@100** | **0.0021** | | dot_accuracy@1 | 0.0198 | | dot_accuracy@3 | 0.0406 | | dot_accuracy@5 | 0.0653 | | dot_accuracy@10 | 0.1267 | | dot_precision@1 | 0.0198 | | dot_precision@3 | 0.0149 | | dot_precision@5 | 0.0149 | | dot_precision@10 | 0.0168 | | dot_recall@1 | 0.0 | | dot_recall@3 | 0.0 | | dot_recall@5 | 0.0001 | | dot_recall@10 | 0.0002 | | dot_ndcg@10 | 0.0168 | | dot_mrr@10 | 0.0425 | | dot_map@100 | 0.0021 | #### Information Retrieval * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.0188 | | cosine_accuracy@3 | 0.0376 | | cosine_accuracy@5 | 0.0644 | | cosine_accuracy@10 | 0.1307 | | cosine_precision@1 | 0.0188 | | cosine_precision@3 | 0.0139 | | cosine_precision@5 | 0.0158 | | cosine_precision@10 | 0.0172 | | cosine_recall@1 | 0.0 | | cosine_recall@3 | 0.0 | | cosine_recall@5 | 0.0001 | | cosine_recall@10 | 0.0002 | | cosine_ndcg@10 | 0.017 | | cosine_mrr@10 | 0.0419 | | **cosine_map@100** | **0.0023** | | dot_accuracy@1 | 0.0188 | | dot_accuracy@3 | 0.0376 | | dot_accuracy@5 | 0.0644 | | dot_accuracy@10 | 0.1307 | | dot_precision@1 | 0.0188 | | dot_precision@3 | 0.0139 | | dot_precision@5 | 0.0158 | | dot_precision@10 | 0.0172 | | dot_recall@1 | 0.0 | | dot_recall@3 | 0.0 | | dot_recall@5 | 0.0001 | | dot_recall@10 | 0.0002 | | dot_ndcg@10 | 0.017 | | dot_mrr@10 | 0.0419 | | dot_map@100 | 0.0023 | #### Information Retrieval * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.0188 | | cosine_accuracy@3 | 0.0366 | | cosine_accuracy@5 | 0.0644 | | cosine_accuracy@10 | 0.1307 | | cosine_precision@1 | 0.0188 | | cosine_precision@3 | 0.0135 | | cosine_precision@5 | 0.0156 | | cosine_precision@10 | 0.0172 | | cosine_recall@1 | 0.0 | | cosine_recall@3 | 0.0 | | cosine_recall@5 | 0.0001 | | cosine_recall@10 | 0.0002 | | cosine_ndcg@10 | 0.017 | | cosine_mrr@10 | 0.0418 | | **cosine_map@100** | **0.0022** | | dot_accuracy@1 | 0.0188 | | dot_accuracy@3 | 0.0366 | | dot_accuracy@5 | 0.0644 | | dot_accuracy@10 | 0.1307 | | dot_precision@1 | 0.0188 | | dot_precision@3 | 0.0135 | | dot_precision@5 | 0.0156 | | dot_precision@10 | 0.0172 | | dot_recall@1 | 0.0 | | dot_recall@3 | 0.0 | | dot_recall@5 | 0.0001 | | dot_recall@10 | 0.0002 | | dot_ndcg@10 | 0.017 | | dot_mrr@10 | 0.0418 | | dot_map@100 | 0.0022 | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 1,010 training samples * Columns: anchor and positive * Approximate statistics based on the first 1000 samples: | | anchor | positive | |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:---------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What is the purpose of the MultiHop-RAG dataset and what does it consist of? | The MultiHop-RAG dataset is developed to benchmark Retrieval-Augmented Generation (RAG) for multi-hop queries. It consists of a knowledge base, a large collection of multi-hop queries, their ground-truth answers, and the associated supporting evidence. The dataset is built using an English news article dataset as the underlying RAG knowledge base. | | Among Google, Apple, and Nvidia, which company reported the largest profit margins in their third-quarter reports for the fiscal year 2023? | Apple reported the largest profit margins in their third-quarter reports for the fiscal year 2023. | | Under what circumstances should the LLM answer the questions? | The LLM should answer the questions based solely on the information provided in the paragraphs, and it should not use any other information. | * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `num_train_epochs`: 10 - `warmup_ratio`: 0.1 - `fp16`: True #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 10 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | cosine_map@100 | |:------:|:----:|:-------------:|:--------------:| | 0 | 0 | - | 0.0018 | | 1.5625 | 100 | - | 0.0019 | | 3.0 | 192 | - | 0.0020 | | 1.5625 | 100 | - | 0.0021 | | 3.125 | 200 | - | 0.0020 | | 4.6875 | 300 | - | 0.0021 | | 5.0 | 320 | - | 0.0020 | | 1.5625 | 100 | - | 0.0020 | | 3.125 | 200 | - | 0.0021 | | 4.6875 | 300 | - | 0.0022 | | 1.5625 | 100 | - | 0.0021 | | 3.125 | 200 | - | 0.0019 | | 4.6875 | 300 | - | 0.0022 | | 6.25 | 400 | - | 0.0022 | | 7.8125 | 500 | 0.0021 | 0.0022 | | 9.375 | 600 | - | 0.0023 | | 10.0 | 640 | - | 0.0022 | ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.0.1 - Transformers: 4.42.3 - PyTorch: 2.3.0+cu121 - Accelerate: 0.32.1 - Datasets: 2.20.0 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```