sarath-shekkizhar
commited on
Commit
•
e75ac79
1
Parent(s):
b63df6b
Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ Our approach aims to mitigate forgetting in LLMs in a computationally efficient
|
|
22 |
thereby enabling continual fine-tuning capabilities without altering the pre-trained output distribution.
|
23 |
Llama-3-TenyxChat-70B was trained using eight A100s (80GB) for fifteen hours, with a training setup obtained from HuggingFaceH4 ([GitHub](https://github.com/huggingface/alignment-handbook)).
|
24 |
|
25 |
-
*The MT-Bench evaluation we perform follows the latest eval upgrade as PR'd [here](https://github.com/lm-sys/FastChat/pull/3158). This PR upgrades the evaluation from `GPT-4-0613` to `GPT-4-preview-0125` (latest version) as well as corrects and improves the quality of the reference answers for a subset of questions. These changes are required to correct the erroneous rating during
|
26 |
|
27 |
|
28 |
**Model Developers** [Tenyx Research](https://www.tenyx.com/research)
|
@@ -30,7 +30,7 @@ Llama-3-TenyxChat-70B was trained using eight A100s (80GB) for fifteen hours, wi
|
|
30 |
|
31 |
# Model details
|
32 |
|
33 |
-
- Model type: Fine-tuned
|
34 |
- License: Meta Llama 3 Community License
|
35 |
- Base model: [Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)
|
36 |
- Demo: Coming Soon!
|
@@ -63,9 +63,9 @@ At the time of release (April 2024), Llama3-TenyxChat-70B is the highest-ranked
|
|
63 |
|
64 |
## MT-Bench
|
65 |
|
66 |
-
MT-Bench is a benchmark made up of 80 high-quality multi-turn questions. These questions fall into eight categories: Writing, Roleplay, Reasoning, Math, Coding, Extraction, STEM, and Humanities. The chat models are rated using GPT-4 on a scale of 1 to 10, with higher values corresponding to better responses.
|
67 |
|
68 |
-
| Model-name | GPT4-0125
|
69 |
|--------------------------------|----------------------------|----------------|
|
70 |
| GPT-4-1106 | 8.79 | 1251 |
|
71 |
| Claude 3 Opus (20240229) | 8.57 | 1247 |
|
|
|
22 |
thereby enabling continual fine-tuning capabilities without altering the pre-trained output distribution.
|
23 |
Llama-3-TenyxChat-70B was trained using eight A100s (80GB) for fifteen hours, with a training setup obtained from HuggingFaceH4 ([GitHub](https://github.com/huggingface/alignment-handbook)).
|
24 |
|
25 |
+
*The MT-Bench evaluation we perform follows the latest eval upgrade as PR'd [here](https://github.com/lm-sys/FastChat/pull/3158). This PR upgrades the evaluation from `GPT-4-0613` to `GPT-4-preview-0125` (latest version) as well as corrects and improves the quality of the reference answers for a subset of questions. These changes are required to correct the erroneous rating during previous evaluation.
|
26 |
|
27 |
|
28 |
**Model Developers** [Tenyx Research](https://www.tenyx.com/research)
|
|
|
30 |
|
31 |
# Model details
|
32 |
|
33 |
+
- Model type: Fine-tuned 70B Instruct model for chat.
|
34 |
- License: Meta Llama 3 Community License
|
35 |
- Base model: [Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)
|
36 |
- Demo: Coming Soon!
|
|
|
63 |
|
64 |
## MT-Bench
|
65 |
|
66 |
+
MT-Bench is a benchmark made up of 80 high-quality multi-turn questions. These questions fall into eight categories: Writing, Roleplay, Reasoning, Math, Coding, Extraction, STEM, and Humanities. The chat models are rated using `GPT-4-preview-0125` on a scale of 1 to 10, with higher values corresponding to better responses.
|
67 |
|
68 |
+
| Model-name | GPT4-preview-0125 MT Bench | Chat Arena Elo |
|
69 |
|--------------------------------|----------------------------|----------------|
|
70 |
| GPT-4-1106 | 8.79 | 1251 |
|
71 |
| Claude 3 Opus (20240229) | 8.57 | 1247 |
|