File size: 4,022 Bytes
3ce286c
 
 
 
 
 
 
 
ecf26d2
3ce286c
 
 
 
 
bbdc3e4
 
3ce286c
 
 
 
 
 
 
 
 
 
 
bbdc3e4
3ce286c
 
1d760b8
3ce286c
 
 
 
 
 
 
 
4581ae9
 
6ab6138
792f8d4
9232769
792f8d4
 
 
 
 
 
9232769
 
 
4581ae9
5abab7f
 
 
 
 
 
 
3ce286c
 
 
 
 
 
 
 
 
 
 
 
01b9544
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: llama2
tags:
- economics
- chatgpt
- llama
- vicuna
- tutorbot
- its
---

# Nash Model Card

## Github details
Training of Nash (Economics) Model is based code for training the equivalent Spock (Biology) model.

Please checkout the repo: https://github.com/luffycodes/Tutorbot-Spock-Bio.

## Model details

**Model type:**
Nash is an open-source educational tutoring chatbot trained by fine-tuning LLaMA and Vicuna model on synthetic student-tutorbot conversations generated using a specialized prompt.

**Model date:**
Nash was trained between July 2023 and August 2023.

**Organizations developing the model:**
The Nash team with members from Rice University and OpenStax.

## Training dataset
700 conversations generated using a [specialized prompt](https://github.com/luffycodes/Tutorbot-Spock-Bio/blob/main/prompts/conversation_gen/v3.txt) from GPT-4 based on OpenStax Economics, Microeconomics, and Macroeconomics textbooks. 

**Paper or resources for more information:**
https://arxiv.org/abs/2305.13272

**Code or resources for more information:**
Training on Nash is based on:
https://github.com/luffycodes/Tutorbot-Spock-Bio

## Use Policy

Since the model is derivate of Llama model, please abide by Llama use policy [here](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf/blob/main/USE_POLICY.md)
and [Llama-Responsible-Use-Guide](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf/blob/main/Responsible-Use-Guide.pdf).

**Ethical Considerations, License and Limitations:**
Similarly, since the model is derivate of Llama model, same ethical considers, license and limitations as Llama apply.

**Out-of-scope Uses:** 
Similarly, use in any manner that violates applicable laws or regulations (including trade compliance laws).
Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2.

"Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model."

## LLM Performance based on [huggingface LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)

|||Average|ARC|HellaSwag|MMLU|TruthfulQA|
|---|---|---|---|---|---|---|
|this model (fine-tuned on vicuna-13b-v1.5)|13B|61.8 |59.13 |80.64 |56.12 | 51.29 |
|lmsys/vicuna-13b-v1.5|13B|61.63 |57.08 |81.24 |56.67 |51.51 |
|meta-llama/Llama-2-13b-chat-hf|13B|59.93|59.04|81.94|54.64|44.12|

If you use this work, please cite:
CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles
https://arxiv.org/abs/2305.13272
```
@misc{sonkar2023class,
      title={CLASS Meet SPOCK: An Education Tutoring Chatbot based on Learning Science Principles}, 
      author={Shashank Sonkar and Lucy Liu and Debshila Basu Mallick and Richard G. Baraniuk},
      year={2023},
      eprint={2305.13272},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_luffycodes__nash-vicuna-13b-v1dot5-ep2-w-rag-w-simple)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 51.33   |
| ARC (25-shot)         | 59.13          |
| HellaSwag (10-shot)   | 80.64    |
| MMLU (5-shot)         | 56.12         |
| TruthfulQA (0-shot)   | 51.29   |
| Winogrande (5-shot)   | 74.66   |
| GSM8K (5-shot)        | 10.54        |
| DROP (3-shot)         | 26.89         |