MarsupialAI
/

Moistral-11B-v3_exl2

Safetensors

Not-For-All-Audiences

Model card Files Files and versions Community

Running on HF Inference Endpoint.

by dev12br - opened May 10

Discussion

dev12br

May 10

I managed to run the original model on an inference endpoint, but it uses a lot of vram, ending up requiring a very expensive instance, so i was trying to run the quantized version instead, with no luck. Do you know how could i do that?

MarsupialAI

Owner May 10

Sorry, I don't have a clue. I've never used HF's cloud compute and do everything on my own hardware. Not sure what quant formats they can handle.

dev12br

May 10

They don't have support for exl2 out of the box apparently. From what i'm understanding here, i'm going to have to create my own handler.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment