open-llm-leaderboard/open_llm_leaderboard · Model deleted from Pending

Jul 18

•

Hi,

I have a new kind of model that's quite large, called dnhkng/Large.

As it's beyond the 100B parameter limit for BFloat16, so I uploaded a bitsandbytes 4bit version (dnhkng/Large-bnb-4bit) for testing on the Leaderboard. In my personal tests, this model does very well, and fits in 50Gb on an H100.

However, I see it was just deleted from the Pending list! Is there a reason for this? Should I just resubmit?

The model was generated with new technique, and I think it should be tested, even if it's handicapped to 4-bit. The BFloat16 performs better, but I understand if you don't want to test it, as it needs 3x H100s for inference.

But the 4bit model runs on the standard transformers code, and runs on 50Gb VRAM, so should be fine for Leaderboard submission.

I would also be happy to sponsor the full run using the BFloat16 model.

dnhkng

Jul 18

OK, I have resubmitted dnhkng/Large-bnb-4bit

If there are any issues with this model, please let me know so I can correct them for a proper submission.

dnhkng

Jul 19

And it's gone, again!

Could someone at least explain what the issue is?

dnhkng

Jul 19

@clefourrier Is there a problem with the number of parameters? I would have thought if it fits on an 80GB card, the absolute number of params is not relevant.

alozowski

Open LLM Leaderboard org Jul 19

Hi @dnhkng ,

Could you please provide the requests file? Thus, I'll be able to investigate the state of your model submission

clefourrier

Open LLM Leaderboard org Jul 19

Hi @dnhkng ,

Please start by reading our FAQ (in our documentation, linked both at the top and in the submit tab).
Notably,

when you report a problem with a model, we need you to point to the request file so we can investigate
you should avoid re-submitting models when they don't seem to work, as it's adding useless strain on our system (when evals go through, your model will be evaluated two times, which is a waste of compute).
please be patient - you opened this issue less than 24h ago and already sent 4 messages in it + tagged a maintainer. We are looking at all issues daily but are not necessarily in the same time zone as you are.

dnhkng

Jul 19

•

edited Jul 19

Sorry for the spam! I was making updated posts to track what was happening as I tried things, but I went overboard 😅.
I didn't mean to be annoying; you are doing the community a great service, thanks for the effort!

The submission request is here:
https://huggingface.co/datasets/open-llm-leaderboard/requests/resolve/main/dnhkng/Large-bnb-4bit_eval_request_False_4bit_Original.json

UPDATE:
I've renamed the model just now, so the new technique for creating it is in the name:
dnhkng/RYS-Huge-bnb-4bit

Once the results are here, I will include them in the paper 😃

dnhkng

Jul 22

Update: The model failed to run. When someone has time, please let me know if its something I can fix!

https://huggingface.co/datasets/open-llm-leaderboard/requests/resolve/main/dnhkng/Large-bnb-4bit_eval_request_False_4bit_Original.json

alozowski

Open LLM Leaderboard org Jul 22

Hi @dnhkng ,

Thanks for providing the requests file! We are investigating the error and I'll relaunch your model as soon as possible

alozowski

Open LLM Leaderboard org Jul 22

It appeared to be an issue from our side, but everything is fixed and I've relaunched your model – hope it will be fine :)

I close this discussion, please, ping me here if you encounter any other problems with this exact model or start a new one

alozowski changed discussion status to closed Jul 22

dnhkng

Jul 23

@alozowski Hi again!

The job failed :(
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/dnhkng/Large-bnb-4bit_eval_request_False_4bit_Original.json

alozowski changed discussion status to open Jul 23

alozowski

Open LLM Leaderboard org Jul 23

Hi @dnhkng ,

According to the log, the issue is on our side – let me investigate it and I'll relaunch the model as soon as I can

clefourrier

Open LLM Leaderboard org Jul 29

•

edited Jul 29

Hi @dnhkng ,
I tried to re-run your model but it's no longer available.
It looks like you renamed it to dnhkng/RYS-Huge-bnb-4bit so I relaunched this one. Please leave your models public/don't rename them when we're investigating evaluations next time please.

dnhkng

Jul 30

I'm not having much luck. My other model, RYS-XLarge just failed too:

https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/dnhkng/RYS-XLarge_eval_request_False_bfloat16_Original.json

clefourrier

Open LLM Leaderboard org Jul 31

Hi! Network error, passed it to pending again

dnhkng

Jul 31

BTW, dnhkng/RYS-Huge-bnb-4bit also still shows up as failed.

https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/dnhkng/RYS-Huge-bnb-4bit_eval_request_False_4bit_Original.json

clefourrier

Open LLM Leaderboard org Jul 31

Hi!
Yes, I had relaunched it manually.

dnhkng

Aug 6

I have deleted my model dnhkng/RYS-Huge-bnb-4bit (I made some errors when it was created, which is why it scored badly), but its still on the leaderboard. Could this be removed please?

I'm not sure how to do it myself.

clefourrier

Open LLM Leaderboard org Aug 6

Hi!
You're not supposed to do it yourself, you're supposed to ask here ^^ - is the above request file the correct one?

dnhkng

Aug 6

Yes, dnhkng/RYS-Huge-bnb-4bit_eval_request_False_4bit_Original.json. is the one that should be deleted :)

I see my new model worked pretty well. I selected it based on a new dataset I made of under 100 samples. I heard you on the Latent Spaces podcast so I thought you might be interested in new datasets.

clefourrier

Open LLM Leaderboard org Aug 6

Thanks, changed its status to DELETED, it should be removed from display at the next leaderboard restart.
Closing the issue, but cc @alozowski : we should add the info on how to delete a model in the FAQ.

Thanks for having listened to the Latent Spaces podcast! I'm indeed working on new evaluation datasets atm, you can send me an email at [email protected]

clefourrier changed discussion status to closed Aug 6