sunitha-ravi commited on
Commit
5523f86
1 Parent(s): d2e0b01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -17
README.md CHANGED
@@ -68,28 +68,23 @@ The model will output the score as 'PASS' if the answer is faithful to the docum
68
  To run inference, you can use HF pipeline:
69
 
70
  ```
71
- import transformers
72
 
73
- model_id = "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct"
74
-
75
- pipeline = transformers.pipeline(
76
- "text-generation",
77
- model=model_id,
78
- max_new_tokens=600,
79
- device="cuda",
80
- eturn_full_text=False
81
- )
82
 
83
  messages = [
84
  {"role": "user", "content": prompt},
85
  ]
86
 
87
- outputs = pipeline(
88
- messages,
89
- temperature=0
90
- )
91
 
92
- print(outputs[0]["generated_text"])
93
  ```
94
 
95
  Since the model is trained in chat format, ensure that you pass the prompt as a user message.
@@ -100,7 +95,21 @@ For more information on training details, refer to our [ArXiv paper](https://arx
100
 
101
  The model was evaluated on [PatronusAI/HaluBench](https://huggingface.co/datasets/PatronusAI/HaluBench).
102
 
103
- It outperforms GPT-3.5-Turbo, GPT-4-Turbo, GPT-4o and Claude-3-Sonnet.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
 
105
  ## Citation
106
  If you are using the model, cite using
@@ -116,4 +125,5 @@ If you are using the model, cite using
116
 
117
  ## Model Card Contact
118
  [@sunitha-ravi](https://huggingface.co/sunitha-ravi)
119
- [@RebeccaQian1](https://huggingface.co/RebeccaQian1)
 
 
68
  To run inference, you can use HF pipeline:
69
 
70
  ```
 
71
 
72
+ model_name = 'PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct'
73
+ pipe = pipeline(
74
+ "text-generation",
75
+ model=model_name,
76
+ max_new_tokens=600,
77
+ device="cuda",
78
+ return_full_text=False
79
+ )
 
80
 
81
  messages = [
82
  {"role": "user", "content": prompt},
83
  ]
84
 
85
+ result = pipe(messages)
86
+ print(result[0]['generated_text'])
 
 
87
 
 
88
  ```
89
 
90
  Since the model is trained in chat format, ensure that you pass the prompt as a user message.
 
95
 
96
  The model was evaluated on [PatronusAI/HaluBench](https://huggingface.co/datasets/PatronusAI/HaluBench).
97
 
98
+
99
+ | Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
100
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
101
+ | GPT-4o | 87.9% | 84.3% | **85.3%** | 84.3% | 95.0% | 82.1% | 86.5% |
102
+ | GPT-4-Turbo | 86.0% | **85.0%** | 82.2% | 84.8% | 90.6% | 83.5% | 85.0% |
103
+ | GPT-3.5-Turbo | 62.2% | 50.7% | 60.9% | 57.2% | 56.7% | 62.8% | 58.7% |
104
+ | Claude-3-Sonnet | 84.5% | 79.1% | 69.7% | 84.3% | 95.0% | 82.9% | 78.8% |
105
+ | Claude-3-Haiku | 68.9% | 78.9% | 58.4% | 84.3% | 95.0% | 82.9% | 69.0% |
106
+ | RAGAS Faithfulness | 70.6% | 75.8% | 59.5% | 59.6% | 75.0% | 67.7% | 66.9% |
107
+ | Mistral-Instruct-7B | 78.3% | 77.7% | 56.3% | 56.3% | 71.7% | 77.9% | 69.4% |
108
+ | Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |
109
+ | Llama-3-Instruct-70B | 87.0% | 83.8% | 72.7% | 69.4% | 85.0% | 82.6% | 80.1% |
110
+ | LYNX (8B) | 85.7% | 80.0% | 72.5% | 77.8% | 96.3% | 85.2% | 82.9% |
111
+ | LYNX (70B) | **88.4%** | 80.2% | 81.4% | **86.4%** | **97.5%** | **90.4%** | **87.4%** |
112
+
113
 
114
  ## Citation
115
  If you are using the model, cite using
 
125
 
126
  ## Model Card Contact
127
  [@sunitha-ravi](https://huggingface.co/sunitha-ravi)
128
+ [@RebeccaQian1](https://huggingface.co/RebeccaQian1)
129
+ [@presidev](https://huggingface.co/presidev)