Text Generation
Transformers
PyTorch
llama
text-generation-inference
Inference Endpoints

mat1 and mat2 shapes cannot be multiplied (38x5120 and 1x2560)

#13
by LaferriereJC - opened
    RuntimeError: mat1 and mat2 shapes cannot be multiplied (38x5120 and 1x2560)
    Output generated in 0.36 seconds (0.00 tokens/s, 0 tokens, context 38, seed 1942857980)

image.png

LaferriereJC changed discussion title from error when attempting to run model quantized to error when attempting to run model quantized mat1 and mat2 shapes cannot be multiplied (38x5120 and 1x2560)
LaferriereJC changed discussion title from error when attempting to run model quantized mat1 and mat2 shapes cannot be multiplied (38x5120 and 1x2560) to mat1 and mat2 shapes cannot be multiplied (38x5120 and 1x2560)

In the file config.json, change the value of "pretraining_tp" (currently on line 18) from 2 to 1. This should fix it.

Sign up or log in to comment