--- tags: - sft license: other language: - en pipeline_tag: text-generation --- This is [OpenAssistant's llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319) in a couple of GGML formats. I had to apply this [workaround](https://huggingface.co/OpenAssistant/oasst-sft-6-llama-30b-xor/discussions/2) to pad the vocab and quantize the models, this may or may not affect performance.
I have no idea what I'm doing so if something doesn't work as it should or at all that's likely on me, not the models themselves. Below is the suggested prompt format from the original repo: For the initial response use (e.g. the [llama2 default system prompt](https://github.com/facebookresearch/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/generation.py#L46) works well): ``` <|system|>system message<|prompter|>user prompt<|assistant|> ``` For multi-turn conversations use: ``` <|system|>system message<|prompter|>Q1<|assistant|>A1<|prompter|>Q2<|assistant|> ```