Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- allenai/qasper
|
4 |
+
license: apache-2.0
|
5 |
+
---
|
6 |
+
|
7 |
+
# Model Card for TinyLlama-abs2qa
|
8 |
+
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
9 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
+
|
11 |
+
This model was an experiment to see if I could get a model to generate useful questions from a scientific paper's abstract. The answer was yes!
|
12 |
+
|
13 |
+
## Model Details
|
14 |
+
|
15 |
+
The base model is TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T, thanks to the TinyLlama devs for training and releasing it!
|
16 |
+
|
17 |
+
As such, it has a context size of 4096 tokens
|
18 |
+
|
19 |
+
Training data was a modified form of the QASPER train split, which contains 1169 examples of abstracts and suitable questions for NLP papers.
|
20 |
+
|
21 |
+
### Model Description
|
22 |
+
|
23 |
+
I modified the QASPER dataset a little to do this training. The original has the abstract and a set of questions and their answers.
|
24 |
+
For this test I only wanted to see if I could generate questions from abstracts, so I extracted only those parts and formulated them in an alpaca style instruction:
|
25 |
+
|
26 |
+
{"instruction":"Here is the the abstract for a scientific paper:
|
27 |
+
It has been shown that word embeddings derived from large corpora
|
28 |
+
tend to incorporate biases present in their training data. Various
|
29 |
+
methods for mitigating these biases have been proposed, but recent
|
30 |
+
work has demonstrated that these methods hide but fail to truly
|
31 |
+
remove the biases, which can still be observed in word
|
32 |
+
nearest-neighbor statistics. In this work we propose a probabilistic
|
33 |
+
view of word embedding bias. We leverage this framework to present a
|
34 |
+
novel method for mitigating bias which relies on probabilistic
|
35 |
+
observations to yield a more robust bias mitigation algorithm.
|
36 |
+
We demonstrate that this method effectively reduces bias according
|
37 |
+
to three separate measures of bias while maintaining embedding quality
|
38 |
+
across various popular benchmark semantic tasks
|
39 |
+
What would be some questions that the paper could answer?",
|
40 |
+
"output":"How is embedding quality assessed?
|
41 |
+
What are the three measures of bias which are reduced in experiments?
|
42 |
+
What are the probabilistic observations which contribute to the more robust algorithm?"}
|
43 |
+
|
44 |
+
I'm not sure how critical the instruction phrasing is, but with the instructions as in the training,
|
45 |
+
this tiny model actually does a pretty good job on totally unseen abstracts in NLP.
|
46 |
+
|
47 |
+
Training this model with [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) took only 3 minutes on an A100.
|
48 |
+
Wrangling the environment to get axolotl to work took a lot longer and if you can I highly recommend using their docker.
|
49 |
+
|
50 |
+
|
51 |
+
- **Developed by:** Andrew Green
|
52 |
+
- **Model type:** Llama 2 architecture, 1.1B parameters
|
53 |
+
- **Language(s) (NLP):** english
|
54 |
+
- **License:** Apache 2.0
|
55 |
+
- **Finetuned from model:** TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
|
56 |
+
|
57 |
+
## Uses
|
58 |
+
|
59 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
60 |
+
I intend to use this model or a derivative of it to screen papers for inclusion in literature summarisation tools in the future.
|
61 |
+
|
62 |
+
Another thing I want to try is using this model to augment QASPER for other fields.
|
63 |
+
|
64 |
+
Since it is so fast to train, I think it will also be a useful testbed for trying out some other techniques like DPO and SPIN that I want to learn.
|
65 |
+
|
66 |
+
### Direct Use
|
67 |
+
|
68 |
+
Directly using this model should be possible, though some testing of the impact of slightly different prompting styles would be needed, and I think it
|
69 |
+
will generate ad infinitum because I didn't use a chat template - that's on my to-do list and should be quick enough.
|
70 |
+
|
71 |
+
From a few quick tests, the generated questions look at least plausible, though they may have questionable utility in the real world
|
72 |
+
|
73 |
+
### Out-of-Scope Use
|
74 |
+
|
75 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
76 |
+
The model was finetuned on scientific articles for NLP, and questions about the articles written by NLP experts. As such, it is quite likely the model
|
77 |
+
will not work well on other fields. In my limited testing however, it does seem to generalise ok.
|
78 |
+
|
79 |
+
The same risks for misuse and malicious use apply as they would for any LLM, but in particluar this model has the potential to generate questions from
|
80 |
+
an abstract, which could lead to it being misused in academia (e.g. to partially automate peer review). This would be a violation of most publisher's terms
|
81 |
+
I think.
|
82 |
+
|
83 |
+
## Bias, Risks, and Limitations
|
84 |
+
|
85 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
86 |
+
This model is based on the TinyLlama model, which is a foundation model so all the same risks of out of scope use there apply.
|
87 |
+
|
88 |
+
The model is biased towards NLP abstracts, because those are contained in the QASPER dataset on which it is trained.
|
89 |
+
|
90 |
+
This is a very small model, so it is likely to be quite limited in its reasoning capabilities, which may lead to nonsense or irrelevant questions being generated.
|
91 |
+
|
92 |
+
### Recommendations
|
93 |
+
|
94 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
95 |
+
|
96 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
|