afg1 commited on
Commit
5fd578a
1 Parent(s): a613917

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - allenai/qasper
4
+ license: apache-2.0
5
+ ---
6
+
7
+ # Model Card for TinyLlama-abs2qa
8
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+ This model was an experiment to see if I could get a model to generate useful questions from a scientific paper's abstract. The answer was yes!
12
+
13
+ ## Model Details
14
+
15
+ The base model is TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T, thanks to the TinyLlama devs for training and releasing it!
16
+
17
+ As such, it has a context size of 4096 tokens
18
+
19
+ Training data was a modified form of the QASPER train split, which contains 1169 examples of abstracts and suitable questions for NLP papers.
20
+
21
+ ### Model Description
22
+
23
+ I modified the QASPER dataset a little to do this training. The original has the abstract and a set of questions and their answers.
24
+ For this test I only wanted to see if I could generate questions from abstracts, so I extracted only those parts and formulated them in an alpaca style instruction:
25
+
26
+ {"instruction":"Here is the the abstract for a scientific paper:
27
+ It has been shown that word embeddings derived from large corpora
28
+ tend to incorporate biases present in their training data. Various
29
+ methods for mitigating these biases have been proposed, but recent
30
+ work has demonstrated that these methods hide but fail to truly
31
+ remove the biases, which can still be observed in word
32
+ nearest-neighbor statistics. In this work we propose a probabilistic
33
+ view of word embedding bias. We leverage this framework to present a
34
+ novel method for mitigating bias which relies on probabilistic
35
+ observations to yield a more robust bias mitigation algorithm.
36
+ We demonstrate that this method effectively reduces bias according
37
+ to three separate measures of bias while maintaining embedding quality
38
+ across various popular benchmark semantic tasks
39
+ What would be some questions that the paper could answer?",
40
+ "output":"How is embedding quality assessed?
41
+ What are the three measures of bias which are reduced in experiments?
42
+ What are the probabilistic observations which contribute to the more robust algorithm?"}
43
+
44
+ I'm not sure how critical the instruction phrasing is, but with the instructions as in the training,
45
+ this tiny model actually does a pretty good job on totally unseen abstracts in NLP.
46
+
47
+ Training this model with [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) took only 3 minutes on an A100.
48
+ Wrangling the environment to get axolotl to work took a lot longer and if you can I highly recommend using their docker.
49
+
50
+
51
+ - **Developed by:** Andrew Green
52
+ - **Model type:** Llama 2 architecture, 1.1B parameters
53
+ - **Language(s) (NLP):** english
54
+ - **License:** Apache 2.0
55
+ - **Finetuned from model:** TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
56
+
57
+ ## Uses
58
+
59
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
60
+ I intend to use this model or a derivative of it to screen papers for inclusion in literature summarisation tools in the future.
61
+
62
+ Another thing I want to try is using this model to augment QASPER for other fields.
63
+
64
+ Since it is so fast to train, I think it will also be a useful testbed for trying out some other techniques like DPO and SPIN that I want to learn.
65
+
66
+ ### Direct Use
67
+
68
+ Directly using this model should be possible, though some testing of the impact of slightly different prompting styles would be needed, and I think it
69
+ will generate ad infinitum because I didn't use a chat template - that's on my to-do list and should be quick enough.
70
+
71
+ From a few quick tests, the generated questions look at least plausible, though they may have questionable utility in the real world
72
+
73
+ ### Out-of-Scope Use
74
+
75
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
76
+ The model was finetuned on scientific articles for NLP, and questions about the articles written by NLP experts. As such, it is quite likely the model
77
+ will not work well on other fields. In my limited testing however, it does seem to generalise ok.
78
+
79
+ The same risks for misuse and malicious use apply as they would for any LLM, but in particluar this model has the potential to generate questions from
80
+ an abstract, which could lead to it being misused in academia (e.g. to partially automate peer review). This would be a violation of most publisher's terms
81
+ I think.
82
+
83
+ ## Bias, Risks, and Limitations
84
+
85
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
86
+ This model is based on the TinyLlama model, which is a foundation model so all the same risks of out of scope use there apply.
87
+
88
+ The model is biased towards NLP abstracts, because those are contained in the QASPER dataset on which it is trained.
89
+
90
+ This is a very small model, so it is likely to be quite limited in its reasoning capabilities, which may lead to nonsense or irrelevant questions being generated.
91
+
92
+ ### Recommendations
93
+
94
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
95
+
96
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.