README.md · IHaveNoClueAndIMustPost/OpenAssistant_llama2-13b-orca-8k-GGML at 676ec3056d914b8c75b2543c36bd439ea4774b25

metadata

tags:
  - sft
license: other
language:
  - en
pipeline_tag: text-generation

This is llama2-13b-orca-8k-3319 in a couple of GGML formats.

I had to apply this workaround to pad the vocab and quantize the models, this may or may not affect performance.
I have no idea what I'm doing so if something doesn't work as it should or at all that's likely on me, not the models themselves.

Below is the suggested prompt format from the original repo:

For the initial response use (e.g. the llama2 default system prompt works well):

<|system|>system message</s><|prompter|>user prompt</s><|assistant|>

For multi-turn conversations use:

<|system|>system message</s><|prompter|>Q1</s><|assistant|>A1</s><|prompter|>Q2</s><|assistant|>