metadata

base_model: BAAI/bge-base-en-v1.5
library_name: setfit
metrics:
  - accuracy
pipeline_tag: text-classification
tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
widget:
  - text: >-
      The answer provided directly relates to the question asked and is
      well-supported by the document, which explains the percentage in the
      response status column as the total amount of successful completion of
      response actions. The answer is concise and specific to the query.


      Final evaluation: Good
  - text: >-
      Evaluation:

      The answer states that the provided information does not cover the
      specific query, suggesting referring to additional sources or providing
      more context. However, the document does cover the process of enabling and
      configuring Endpoint controls and mentions specific features under
      Endpoint controls like Device Control, Personal Firewall Control, and Full
      Disk Encryption Visibility. The document does not explicitly state the
      "purpose" of Endpoint controls, but it is evident from the listed features
      that these controls are for managing device control, firewall settings,
      and disk encryption visibility. Therefore, the answer is not
      well-supported by the document and fails to address the specific question
      adequately.


      Final evaluation: Bad
  - text: >-
      Reasoning:

      1. **Context Grounding**: The answer is supported by the provided document
      where it is mentioned that the On-Site Collector Agent collects logs and
      forwards them to <ORGANIZATION> XDR.

      2. **Relevance**: The purpose of the <ORGANIZATION> XDR On-Site Collector
      Agent is indeed to collect and securely forward logs.

      3. **Conciseness**: The answer is concise and directly addresses the
      specific question asked without unnecessary information.

      4. **Specificity**: The answer is specific to the question regarding the
      purpose of the On-Site Collector Agent, without being too general.

      5. **Key/Value/Event Name**: Although the answer does not include keys or
      values from the document, it is not necessary for this specific question
      about the purpose of the agent.


      The answer meets all the criteria effectively.


      Final evaluation: Good
  - text: >-
      The provided answer does not align well with the document. Here's a
      detailed analysis of the evaluation criteria:


      1. **Context Grounding**: The answer does not seem to be backed up by the
      specifics provided in the document. The document describes settings around
      making sensors stale, archived, or deleted and associated email
      notifications, but it does not explicitly mention a checkbox for email
      notifications in the Users section.


      2. **Relevance**: The answer does not correctly address the specific query
      about the checkbox in the Users section as per the document content. 


      3. **Conciseness**: While the answer is concise, it is not directly
      supported by the content of the document, making it irrelevant.


      4. **Specificity**: The answer lacks specific details or a direct quote
      from the document that mentions the Users section checkbox.


      5. **Accuracy in Key/Value/Event Name**: The document does not provide
      details about a checkbox for email notifications in the Users section,
      thus the key/value/event name aspect is also not correctly covered.


      Based on these points, the answer provided fails to meet the necessary
      criteria.


      Final evaluation: **Bad**
  - text: >-
      **Reasoning**:


      1. **Context Grounding**: The answer does not match the context provided
      in the document. The document specifies different URLs for images related
      to DNS queries and connection queries.
         
      2. **Relevance**: The answer is not relevant to the specific question
      asked. The question asks for the URL of the image for the second query,
      which is clearly provided in the document but not correctly retrieved in
      the answer.


      3. **Conciseness**: The answer is concise but incorrect, making it not
      useful.


      4. **Specificity**: The answer lacks accuracy, which is critical for
      answering the specific question. It provides an incorrect URL.


      5. **Key, Value, Event Name**: Since the question is about a specific URL,
      correctness of the key/value is crucial, which the answer fails to
      provide.


      **Final evaluation**: Bad
inference: true
model-index:
  - name: SetFit with BAAI/bge-base-en-v1.5
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: Unknown
          type: unknown
          split: test
        metrics:
          - type: accuracy
            value: 0.5492957746478874
            name: Accuracy

SetFit with BAAI/bge-base-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-base-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: BAAI/bge-base-en-v1.5
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
0	'Evaluation:\nThe answer does not directly address the specific question asked. The document provides details on why considering all the answers together when determining if the behavior in a MalOp is malicious is important, such as assessing the significance of involved machines, behaviors, and users. However, the provided answer is too general and fails to capture these specifics.\n\nFinal evaluation: Bad' 'Reasoning:\n\n1. Context Grounding: The answer does not reference the provided document at all and instead suggests seeking additional sources. The steps to exclude a MalOp during the remediation phase are clearly present in the document.\n2. Relevance: The answer does not address the specific question asked. The question seeks the process to exclude a MalOp, and the provided document contains specific steps to achieve this.\n3. Conciseness: The answer is brief but unhelpfully so, as it completely lacks pertinent information available in the document.\n4. Specifics: The document does contain specific instructions on how to exclude a MalOp, which the answer fails to acknowledge or explain.\n5. Key/Value/Event Name: There are relevant actions and links identified in the document, which are not mentioned in the answer.\n\nFinal Evaluation: Bad.' 'Evaluation:\n\n1. Context Grounding: The provided answer is well-supported by the document, specifically stating that a quarantined file should be un-quarantined before submitting it.\n2. Relevance: The answer is directly related to the question, addressing what should be done with a quarantined file.\n3. Conciseness: The answer is concise and to the point, clearly stating the necessary action.\n4. Specificity: The answer gives a particular instruction relevant to the question.\n5. Key Identification: The answer correctly identifies the key action ("un-quarantine") directly from the document.\n\nFinal result: Good'
1	"Reasoning:\n1. Context Grounding: The answer is directly supported by the document, as it explains what happens after configuring the computer to generate a memory dump file.\n2. Relevance: The answer is relevant to the question asked and addresses it directly.\n3. Conciseness: The answer is concise and to the point.\n4. Specificity: The answer specifically mentions the generation of a dump file containing the entire contents of the sensor's RAM, which is directly pulled from the document.\n\nFinal result: Good" 'Evaluation:\n1. Context Grounding: The answer is grounded in the document, which mentions that the platform uses an advanced engine to identify cyber security threats.\n2. Relevance: The answer directly addresses the question by stating the purpose of the threat detection abilities.\n3. Conciseness: The answer is clear and to the point.\n4. Specificity: The answer correctly identifies the purpose of the threat detection abilities as detailed in the document.\n5. Keys/Values/Events: Not applicable in this scenario.\n\nFinal evaluation: Good' 'Reasoning:\nThe answer provided does not address the specific severity score for the fifth scenario in the document. Instead, it suggests that the document does not cover this query and refers to additional sources, which is incorrect. The document contains information about four scenarios, and there is no fifth scenario mentioned within it. The answer should accurately state that there is no fifth scenario provided in the document.\n\nFinal Result: Bad'

Label

Examples

'Evaluation:\nThe answer does not directly address the specific question asked. The document provides details on why considering all the answers together when determining if the behavior in a MalOp is malicious is important, such as assessing the significance of involved machines, behaviors, and users. However, the provided answer is too general and fails to capture these specifics.\n\nFinal evaluation: Bad'
'Reasoning:\n\n1. Context Grounding: The answer does not reference the provided document at all and instead suggests seeking additional sources. The steps to exclude a MalOp during the remediation phase are clearly present in the document.\n2. Relevance: The answer does not address the specific question asked. The question seeks the process to exclude a MalOp, and the provided document contains specific steps to achieve this.\n3. Conciseness: The answer is brief but unhelpfully so, as it completely lacks pertinent information available in the document.\n4. Specifics: The document does contain specific instructions on how to exclude a MalOp, which the answer fails to acknowledge or explain.\n5. Key/Value/Event Name: There are relevant actions and links identified in the document, which are not mentioned in the answer.\n\nFinal Evaluation: Bad.'
'Evaluation:\n\n1. Context Grounding: The provided answer is well-supported by the document, specifically stating that a quarantined file should be un-quarantined before submitting it.\n2. Relevance: The answer is directly related to the question, addressing what should be done with a quarantined file.\n3. Conciseness: The answer is concise and to the point, clearly stating the necessary action.\n4. Specificity: The answer gives a particular instruction relevant to the question.\n5. Key Identification: The answer correctly identifies the key action ("un-quarantine") directly from the document.\n\nFinal result: Good'

"Reasoning:\n1. Context Grounding: The answer is directly supported by the document, as it explains what happens after configuring the computer to generate a memory dump file.\n2. Relevance: The answer is relevant to the question asked and addresses it directly.\n3. Conciseness: The answer is concise and to the point.\n4. Specificity: The answer specifically mentions the generation of a dump file containing the entire contents of the sensor's RAM, which is directly pulled from the document.\n\nFinal result: Good"
'Evaluation:\n1. Context Grounding: The answer is grounded in the document, which mentions that the platform uses an advanced engine to identify cyber security threats.\n2. Relevance: The answer directly addresses the question by stating the purpose of the threat detection abilities.\n3. Conciseness: The answer is clear and to the point.\n4. Specificity: The answer correctly identifies the purpose of the threat detection abilities as detailed in the document.\n5. Keys/Values/Events: Not applicable in this scenario.\n\nFinal evaluation: Good'
'Reasoning:\nThe answer provided does not address the specific severity score for the fifth scenario in the document. Instead, it suggests that the document does not cover this query and refers to additional sources, which is incorrect. The document contains information about four scenarios, and there is no fifth scenario mentioned within it. The answer should accurately state that there is no fifth scenario provided in the document.\n\nFinal Result: Bad'

Evaluation

Metrics

Label	Accuracy
all	0.5493

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("Netta1994/setfit_baai_cybereason_gpt-4o_cot-few_shot-instructions_only_reasoning_1726751890.998")
# Run inference
preds = model("The answer provided directly relates to the question asked and is well-supported by the document, which explains the percentage in the response status column as the total amount of successful completion of response actions. The answer is concise and specific to the query.

Final evaluation: Good")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	19	77.9420	193

Label	Training Sample Count
0	34
1	35

Training Hyperparameters

batch_size: (16, 16)
num_epochs: (5, 5)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 20
body_learning_rate: (2e-05, 2e-05)
head_learning_rate: 2e-05
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0058	1	0.2388	-
0.2890	50	0.2629	-
0.5780	100	0.2313	-
0.8671	150	0.0609	-
1.1561	200	0.0033	-
1.4451	250	0.0024	-
1.7341	300	0.0022	-
2.0231	350	0.0018	-
2.3121	400	0.0018	-
2.6012	450	0.0016	-
2.8902	500	0.0015	-
3.1792	550	0.0014	-
3.4682	600	0.0013	-
3.7572	650	0.0014	-
4.0462	700	0.0014	-
4.3353	750	0.0013	-
4.6243	800	0.0012	-
4.9133	850	0.0012	-

Framework Versions

Python: 3.10.14
SetFit: 1.1.0
Sentence Transformers: 3.1.0
Transformers: 4.44.0
PyTorch: 2.4.1+cu121
Datasets: 2.19.2
Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}