blair-johnson commited on
Commit
ac17d8d
1 Parent(s): bb6b85f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -2
README.md CHANGED
@@ -30,6 +30,66 @@ The Alpaca dataset is a set of 52k instruct-response pairs designed to enhace th
30
  The GALACTICA model card specifies that the primary indended users of the GALACTICA models are researchers studying language models applied to the scientific domain, and it cautions against production use of GALACTICA without safeguards due to the potential for the model to produce inaccurate information.
31
  The original GALACTICA models are available under a non-commercial CC BY-NC 4.0 license, and the GALPACA model is additionally subject to the [OpenAI Terms of Service](https://openai.com/policies/terms-of-use).
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ## Training Data
34
 
35
  The GALPACA models are trained by fine-tuning pre-trained GALACTICA models on the Alpaca dataset. GALACTICA models were trained on 106 billion tokens of open-access scientific text and data, including papers, textbooks, scientific websites, encyclopedias, and more.
@@ -65,11 +125,11 @@ print(tokenizer.batch_decode(out_tokens, skip_special_tokens=False, clean_up_tok
65
 
66
  ## Training Resources
67
 
68
- GALPACA 30B was fine-tuned in about 6 hours using 16 A100 80GB GPUS, 16-bit mixed-precision, an effective batch-size of 1024, and with a maximum context window of 384 tokens. This model was trained using DeepSpeed Stage 3 optimizations.
69
 
70
  ## Performance and Limitations
71
 
72
- More information about the performance and limitations of this family of models can be found on the original GALACTICA model card.
73
 
74
  ## Works Cited
75
 
 
30
  The GALACTICA model card specifies that the primary indended users of the GALACTICA models are researchers studying language models applied to the scientific domain, and it cautions against production use of GALACTICA without safeguards due to the potential for the model to produce inaccurate information.
31
  The original GALACTICA models are available under a non-commercial CC BY-NC 4.0 license, and the GALPACA model is additionally subject to the [OpenAI Terms of Service](https://openai.com/policies/terms-of-use).
32
 
33
+ <details>
34
+ <summary> Example Outputs </summary>
35
+
36
+ *__Full Prompt:__*
37
+ ```
38
+ Below is an instruction that describes a task. As a scientist, write a response that appropriately completes the request, citing specific papers whenever possible.
39
+
40
+ ### Instruction:
41
+ Write out Maxwell's equations and explain each one.
42
+
43
+ ### Markdown and LaTeX Response:
44
+ ```
45
+
46
+ *__Response:__*
47
+
48
+ Maxwell's equations are a set of four equations that describe the behavior of electromagnetic fields. These equations are:
49
+ $$\nabla \times \textbf{E} = -\frac{\partial \textbf{B}}{\partial t}$$
50
+ $$\nabla \times \textbf{B} = \mu_0 \epsilon_0 \frac{\partial \textbf{E}}{\partial t} + \mu_0 \textbf{J}$$
51
+ $$\nabla \bullet \textbf{E} = \frac{\rho}{\epsilon_0}$$
52
+ $$\nabla \bullet \textbf{B} = 0$$
53
+ The first two equations describe how the electric and magnetic fields, E and B, are related to each other. The third and fourth equations are Gauss's law for electric and magnetic fields, respectively. These equations describe the behavior of the electric and magnetic fields when there are charges and currents in the space. Each equation has a specific purpose and together they form a complete description of the behavior of electromagnetic fields.
54
+
55
+ *__Full Prompt:__*
56
+ ```
57
+ Below is an instruction that describes a task. Write a response that appropriately completes the request.
58
+
59
+ ### Instruction:
60
+ Write a python function implements the Sherman-Morrison matrix inversion lemma using numpy.
61
+
62
+ ### Response:
63
+ ```python
64
+ ```
65
+
66
+ *__Response:__*
67
+ ```python
68
+ def sm_inv_lemma(A, u, v):
69
+ """ Function implementing the Sherman-Morrison matrix inversion lemma using numpy.
70
+
71
+ Parameters
72
+ ----------
73
+ A : numpy array
74
+ The original matrix.
75
+
76
+ u : numpy array
77
+ The first vector in the Sherman-Morrison lemma.
78
+
79
+ v : numpy array
80
+ The second vector in the Sherman-Morrison lemma.
81
+
82
+ Returns
83
+ -------
84
+ The inverse matrix of A - uv^T.
85
+ """
86
+ A_inv = np.linalg.inv(A)
87
+ alpha = 1 / (1 + v.T @ A_inv @ u)
88
+ return A_inv - (alpha * A_inv @ u @ v.T @ A_inv)
89
+ ```
90
+
91
+ </details>
92
+
93
  ## Training Data
94
 
95
  The GALPACA models are trained by fine-tuning pre-trained GALACTICA models on the Alpaca dataset. GALACTICA models were trained on 106 billion tokens of open-access scientific text and data, including papers, textbooks, scientific websites, encyclopedias, and more.
 
125
 
126
  ## Training Resources
127
 
128
+ GALPACA 30B was fine-tuned in about 6 hours using 16 A100 80GB GPUS, 16-bit mixed-precision, an effective batch-size of 1024, and with a maximum context window of 384 tokens. This model was trained using DeepSpeed ZeRO Stage 3 optimizations.
129
 
130
  ## Performance and Limitations
131
 
132
+ Qualitative evaluation suggests that Galpaca frequently outperforms LLaMA-based Alpaca models on tasks related to technical knowledge and programming, while it underperforms on natural langauge tasks such as generating prose. More information about the performance and limitations of the GALACTICA family of models can be found on the original GALACTICA model card.
133
 
134
  ## Works Cited
135