Unique Info?
#1
by
liquidsnakeblue
- opened
Why don't you put the memory requirements in the model card? You have several different BPW's, you should list the expected setup for each.
I haven't measured specifically the VRAM requirements is the main reason. And I don't have automated scripts like TheBloke does for all parts of the quant/upload process is the main reason :) These models are interim solutions until TheBloke starts generating them.
Roughly:
- 2.4bpw gets you under 24 GB so you can load it on a single 3090/4090.
- 3.0, 4.0, and 4.65bpw requires 2x 3090/4090s to run with varying context lengths.
- 6.0bpw should be nearly indistinguishable from fp16 in terms of perplexity at least, but needs > 48 GB VRAM to run. So 3x 3090s/4090s.
Thank you, this is helpful. And thanks for making these.