Work on a paper
This insight is just great -- best kind of optimization (novel but intuitively understandable in retrospect). I've done some additional work on it and was wondering if you'd want to colab on publishing a short paper. You can dm me @theemozilla on twitter (also my discord username). I wouldn't want to publish anything without you as an author. Lemme know!
@emozilla Hello, I do not use Twitter lol. You can email me at [email protected]
Been looking at the work here and the associated blog post, as well as the work here, https://huggingface.co/emozilla/open_llama_7b-scaled.
The idea makes sense to me, but in testing the open_llama_7b-scaled, I get poor results when I increase the context window.
Does the model and method require further fine-tuning? With the openllama, I did not further tune the model.