Running it on an Apple MBP M3 - non-quantized
#14
by
christianweyer
- opened
We are really loving the results we get with the online demo (https://llava.hliu.cc). Kudos!
When trying to run e.g. an fp16 quant on our M3 Max with llama.cpp or Ollama, we get horrible results (due to some pending PRs in llama.cpp).
How are people running it on a Mac without quants to get the same results as with the original demo?
Thanks!