SmolLM Performance

#6
by lingzhiai - opened

I’ve been working with SmolLM recently, and the performance has been far below expectations—it's practically unusable. Here are a few examples to illustrate the issues:

image.png

image.png

Could it be that I'm loading the model incorrectly, or is this a known issue with SmolLM? Any advice on what might be going wrong would be greatly appreciated.

Hugging Face TB Research org
edited Aug 18

Hi, we just updated the Instruct Models and the outputs should be better. You can also try the larger 360M model for better performance in these demos:
https://huggingface.co/spaces/HuggingFaceTB/instant-smollm
https://huggingface.co/spaces/HuggingFaceTB/SmolLM-360M-Instruct-WebGPU

Thanks for the update! Could you please share what changes were made that led to the performance improvement? Was the model retrained with the original data, or were there other adjustments? Any details you can provide would be greatly appreciated. Thanks again for your help!

Hugging Face TB Research org
edited Aug 22

We changed the SFT mix (see changelog):

  • it seems that using WebInstruct data for SFT sometimes confused the models, since it contained advanced science content beyond the model's capacity (hence why the models sometimes bring up math equations that are out of topic), so we switched to Magpie dataset
  • with Magpie the model would answer knwoledge prompts but still failed at answering greetings and "who are you" questions so we built this dataset of 2k simple everyday conversations to fix this behavior https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k

Thanks for your quick reply!

loubnabnl changed discussion status to closed

Sign up or log in to comment