sarath-shekkizhar commited on
Commit
e75ac79
1 Parent(s): b63df6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -22,7 +22,7 @@ Our approach aims to mitigate forgetting in LLMs in a computationally efficient
22
  thereby enabling continual fine-tuning capabilities without altering the pre-trained output distribution.
23
  Llama-3-TenyxChat-70B was trained using eight A100s (80GB) for fifteen hours, with a training setup obtained from HuggingFaceH4 ([GitHub](https://github.com/huggingface/alignment-handbook)).
24
 
25
- *The MT-Bench evaluation we perform follows the latest eval upgrade as PR'd [here](https://github.com/lm-sys/FastChat/pull/3158). This PR upgrades the evaluation from `GPT-4-0613` to `GPT-4-preview-0125` (latest version) as well as corrects and improves the quality of the reference answers for a subset of questions. These changes are required to correct the erroneous rating during the evaluation.
26
 
27
 
28
  **Model Developers** [Tenyx Research](https://www.tenyx.com/research)
@@ -30,7 +30,7 @@ Llama-3-TenyxChat-70B was trained using eight A100s (80GB) for fifteen hours, wi
30
 
31
  # Model details
32
 
33
- - Model type: Fine-tuned Mixture Of Expert 8x7B model for chat.
34
  - License: Meta Llama 3 Community License
35
  - Base model: [Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)
36
  - Demo: Coming Soon!
@@ -63,9 +63,9 @@ At the time of release (April 2024), Llama3-TenyxChat-70B is the highest-ranked
63
 
64
  ## MT-Bench
65
 
66
- MT-Bench is a benchmark made up of 80 high-quality multi-turn questions. These questions fall into eight categories: Writing, Roleplay, Reasoning, Math, Coding, Extraction, STEM, and Humanities. The chat models are rated using GPT-4 on a scale of 1 to 10, with higher values corresponding to better responses.
67
 
68
- | Model-name | GPT4-0125-preview MT Bench | Chat Arena Elo |
69
  |--------------------------------|----------------------------|----------------|
70
  | GPT-4-1106 | 8.79 | 1251 |
71
  | Claude 3 Opus (20240229) | 8.57 | 1247 |
 
22
  thereby enabling continual fine-tuning capabilities without altering the pre-trained output distribution.
23
  Llama-3-TenyxChat-70B was trained using eight A100s (80GB) for fifteen hours, with a training setup obtained from HuggingFaceH4 ([GitHub](https://github.com/huggingface/alignment-handbook)).
24
 
25
+ *The MT-Bench evaluation we perform follows the latest eval upgrade as PR'd [here](https://github.com/lm-sys/FastChat/pull/3158). This PR upgrades the evaluation from `GPT-4-0613` to `GPT-4-preview-0125` (latest version) as well as corrects and improves the quality of the reference answers for a subset of questions. These changes are required to correct the erroneous rating during previous evaluation.
26
 
27
 
28
  **Model Developers** [Tenyx Research](https://www.tenyx.com/research)
 
30
 
31
  # Model details
32
 
33
+ - Model type: Fine-tuned 70B Instruct model for chat.
34
  - License: Meta Llama 3 Community License
35
  - Base model: [Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)
36
  - Demo: Coming Soon!
 
63
 
64
  ## MT-Bench
65
 
66
+ MT-Bench is a benchmark made up of 80 high-quality multi-turn questions. These questions fall into eight categories: Writing, Roleplay, Reasoning, Math, Coding, Extraction, STEM, and Humanities. The chat models are rated using `GPT-4-preview-0125` on a scale of 1 to 10, with higher values corresponding to better responses.
67
 
68
+ | Model-name | GPT4-preview-0125 MT Bench | Chat Arena Elo |
69
  |--------------------------------|----------------------------|----------------|
70
  | GPT-4-1106 | 8.79 | 1251 |
71
  | Claude 3 Opus (20240229) | 8.57 | 1247 |