Unique Info?

#1
by liquidsnakeblue - opened

Why don't you put the memory requirements in the model card? You have several different BPW's, you should list the expected setup for each.

I haven't measured specifically the VRAM requirements is the main reason. And I don't have automated scripts like TheBloke does for all parts of the quant/upload process is the main reason :) These models are interim solutions until TheBloke starts generating them.

Roughly:

  • 2.4bpw gets you under 24 GB so you can load it on a single 3090/4090.
  • 3.0, 4.0, and 4.65bpw requires 2x 3090/4090s to run with varying context lengths.
  • 6.0bpw should be nearly indistinguishable from fp16 in terms of perplexity at least, but needs > 48 GB VRAM to run. So 3x 3090s/4090s.

Thank you, this is helpful. And thanks for making these.

Sign up or log in to comment