afg1
/

tiny-llama-abs2qa

+---
+datasets:
+- allenai/qasper
+license: apache-2.0
+---
+# Model Card for TinyLlama-abs2qa
+[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
+<!-- Provide a quick summary of what the model is/does. -->
+This model was an experiment to see if I could get a model to generate useful questions from a scientific paper's abstract. The answer was yes!
+## Model Details
+The base model is TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T, thanks to the TinyLlama devs for training and releasing it!
+As such, it has a context size of 4096 tokens
+Training data was a modified form of the QASPER train split, which contains 1169 examples of abstracts and suitable questions for NLP papers.
+### Model Description
+I modified the QASPER dataset a little to do this training. The original has the abstract and a set of questions and their answers.
+For this test I only wanted to see if I could generate questions from abstracts, so I extracted only those parts and formulated them in an alpaca style instruction:
+    {"instruction":"Here is the the abstract for a scientific paper:
+      It has been shown that word embeddings derived from large corpora
+      tend to incorporate biases present in their training data. Various
+      methods for mitigating these biases have been proposed, but recent
+      work has demonstrated that these methods hide but fail to truly
+      remove the biases, which can still be observed in word
+      nearest-neighbor statistics. In this work we propose a probabilistic
+      view of word embedding bias. We leverage this framework to present a
+      novel method for mitigating bias which relies on probabilistic
+      observations to yield a more robust bias mitigation algorithm.
+      We demonstrate that this method effectively reduces bias according
+      to three separate measures of bias while maintaining embedding quality
+      across various popular benchmark semantic tasks
+    What would be some questions that the paper could answer?",
+    "output":"How is embedding quality assessed?
+      What are the three measures of bias which are reduced in experiments?
+      What are the probabilistic observations which contribute to the more robust algorithm?"}
+I'm not sure how critical the instruction phrasing is, but with the instructions as in the training,
+this tiny model actually does a pretty good job on totally unseen abstracts in NLP.
+Training this model with [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) took only 3 minutes on an A100.
+Wrangling the environment to get axolotl to work took a lot longer and if you can I highly recommend using their docker.
+- **Developed by:** Andrew Green
+- **Model type:** Llama 2 architecture, 1.1B parameters
+- **Language(s) (NLP):** english
+- **License:** Apache 2.0
+- **Finetuned from model:** TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+I intend to use this model or a derivative of it to screen papers for inclusion in literature summarisation tools in the future.
+Another thing I want to try is using this model to augment QASPER for other fields.
+Since it is so fast to train, I think it will also be a useful testbed for trying out some other techniques like DPO and SPIN that I want to learn.
+### Direct Use
+Directly using this model should be possible, though some testing of the impact of slightly different prompting styles would be needed, and I think it
+will generate ad infinitum because I didn't use a chat template - that's on my to-do list and should be quick enough.
+From a few quick tests, the generated questions look at least plausible, though they may have questionable utility in the real world
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+The model was finetuned on scientific articles for NLP, and questions about the articles written by NLP experts. As such, it is quite likely the model
+will not work well on other fields. In my limited testing however, it does seem to generalise ok.
+The same risks for misuse and malicious use apply as they would for any LLM, but in particluar this model has the potential to generate questions from
+an abstract, which could lead to it being misused in academia (e.g. to partially automate peer review). This would be a violation of most publisher's terms
+I think.
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+This model is based on the TinyLlama model, which is a foundation model so all the same risks of out of scope use there apply.
+The model is biased towards NLP abstracts, because those are contained in the QASPER dataset on which it is trained.
+This is a very small model, so it is likely to be quite limited in its reasoning capabilities, which may lead to nonsense or irrelevant questions being generated.
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.