Text Generation
rwkv
causal-lm
ggml
Edit model card

RWKV-4 World GGML

This repository contains quantized conversions of the current RWKV-4 World checkpoints.

For use with frontends that support GGML quantized RWKV models, such as rwkv.cpp and KoboldCpp.

Last updated on 2023-09-28.

Description:

RAM USAGE

Model Starting RAM usage (KoboldCpp)
RWKV-4-World-0.1B.q4_0.bin 289.3 MiB
RWKV-4-World-0.1B.q4_1.bin 294.7 MiB
RWKV-4-World-0.1B.q5_0.bin 300.2 MiB
RWKV-4-World-0.1B.q5_1.bin 305.7 MiB
RWKV-4-World-0.1B.q8_0.bin 333.1 MiB
RWKV-4-World-0.1B.f16.bin 415.3 MiB
RWKV-4-World-0.4B.q4_0.bin 484.1 MiB
RWKV-4-World-0.4B.q4_1.bin 503.7 MiB
RWKV-4-World-0.4B.q5_0.bin 523.1 MiB
RWKV-4-World-0.4B.q5_1.bin 542.7 MiB
RWKV-4-World-0.4B.q8_0.bin 640.2 MiB
RWKV-4-World-0.4B.f16.bin 932.7 MiB
RWKV-4-World-1.5B.q4_0.bin 1.2 GiB
RWKV-4-World-1.5B.q4_1.bin 1.3 GiB
RWKV-4-World-1.5B.q5_0.bin 1.4 GiB
RWKV-4-World-1.5B.q5_1.bin 1.5 GiB
RWKV-4-World-1.5B.q8_0.bin 1.9 GiB
RWKV-4-World-1.5B.f16.bin 3.0 GiB

Notes:

  • rwkv.cpp [0df970a] was used for conversion and quantization. First they were converted to f16 ggml files, then quantized.
  • KoboldCpp [bc841ec] was used to test the model.

The original models can be found here, and the original model card can be found below.


RWKV-4 World

Model Description

RWKV-4 trained on 100+ world languages (70% English, 15% multilang, 15% code).

World = Some_Pile + Some_RedPajama + Some_OSCAR + All_Wikipedia + All_ChatGPT_Data_I_can_find

XXXtuned = finetune of World on MC4, OSCAR, wiki, etc.

How to use:

The differences between World & Raven:

  • set pipeline = PIPELINE(model, "rwkv_vocab_v20230424") instead of 20B_tokenizer.json (EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)
  • use Question/Answer or User/AI or Human/Bot for chat. DO NOT USE Bob/Alice or Q/A

For 0.1/0.4/1.5B models, use fp32 for first layer (will overflow in fp16 at this moment - fixable in future), or bf16 if you have 30xx/40xx GPUs. Example strategy: cuda fp32 *1 -> cuda fp16

NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n']

QA prompt (replace \n\n in xxx to \n):

Question: xxx

Answer:

and

Instruction: xxx

Input: xxx

Response:

A good chat prompt (replace \n\n in xxx to \n):

User: hi

Assistant: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.

User: xxx

Assistant:
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Inference API (serverless) has been turned off for this model.

Datasets used to train Crataco/RWKV-4-World-Series-GGML