Llama 3.1 models

#164
by joaquinito2073 - opened

Unfortunately, they are gated, and I cannot access them (they require a facebook account last I tried). If somebody makes an accessible copy, I am, of course, all game.

@nicoboss surely you have a facebook account and are willing to clone/backup them? :)

@Guilherme34 was already granted access and is currently downloading them all into an LXC container running on the same machine as your LXC container. He will start with 8B than 70B and finally 405B. As soon the first models are downloaded, I will read-only mount the folder containing the models into to your LXC container.

Last time i checked you dont actually need a fb account, you "just" have to fill a form with your personal info. Its even available integrated into hf.

@mradermacher The 8B models and 70B base model are already donwloaded and mounted to /Guilherme34/root/.cache/huggingface/hub inside your LXC container. For a models as important as this ones please ignoring any daytime restrictions and quantize them as soon as possible.

The following models are aready downloaded:

  • Meta-Llama-3.1-8B
  • Meta-Llama-3-8B-Instruct
  • Meta-Llama-3.1-70B

Others are still in progress.

The download of Meta-Llama-3.1-70B-Instruct is now compleated as well.
@mradermacher Sorry we first forgot to download the tokenizer but added it now to all the 8B and 70B models.

In case you are confused about the huggingface cache structure the location of the normal safetensor models is the following:

  • Meta-Llama-3.1-8B: /Guilherme34/root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3.1-8B/snapshots/13f04ed6f85ef2aa2fd11b960a275c3e31a8069e
  • Meta-Llama-3-8B-Instruct: /Guilherme34/root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3.1-405B-Instruct/snapshots/c7c9648767719216fd9a80097da3a57b72748028
  • Meta-Llama-3.1-70B: /Guilherme34/root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3.1-70B/snapshots/6113f060e6c497c714cc6463c8bcbd78aefac089
  • Meta-Llama-3.1-70B-Instruct: /Guilherme34/root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3.1-70B-Instruct/snapshots/25acb1b514688b222a02a89c6976a8d7ad0e017

Ah... a local download is not actually that helpful (I can't really get it out at reasonable speeds), I'd need a repo clone. Maybe it's possible to only check in the lfs files (sounds like a security bug in huggingface though, if it's possible).

@Green-Sky last time I filled out the form and then needed a facebook account. i.e. they lied to me. My trust is eroded.

Ah... a local download is not actually that helpful (I can't really get it out at reasonable speeds), I'd need a repo clone. Maybe it's possible to only check in the lfs files (sounds like a security bug in huggingface though, if it's possible).

@Guilherme34 Will give you a token to access them. I will email it to you shortly. In the meantime likely copying over the 8B models over 100 Mbit/s should not take that long.

The 8b will not take very long, but since it is completely unnecessary, wouldn't the time be better invested in getting a clone of all repos, which is pretty much instant? Just asking, no criticism :)

@mradermacher I sent you a mail with the Llama 3.1 access token and a code excample how to use the access token to download it.

Anyways, thanks to @Guilherme34 I can download the models manually then.

The 8b will not take very long, but since it is completely unnecessary, wouldn't the time be better invested in getting a clone of all repos, which is pretty much instant? Just asking, no criticism :)

I fully agree. Just use the access token I sent you to download all of them. Should be super fast. And sorry in model = AutoModelForCausalLM.from_pretrained(base_model_id, token=access_token) you obviously need to insert the access token as well. Also sorry that my email formating got slightely messed up again.

deleted
This comment has been hidden

Tried to figure out git access to clone the repos, but failed, so no public clones, but the models should be slowly coming now. Thanks again for everybody involved :)

mradermacher changed discussion status to closed

@Guilherme34 would it be possible to get access to https://huggingface.co/meta-llama/Llama-Guard-3-8B and https://huggingface.co/meta-llama/Prompt-Guard-86M too, just for completeness, assuming they are under the same conditions.

@Guilherme34 would it be possible to get access to https://huggingface.co/meta-llama/Llama-Guard-3-8B and https://huggingface.co/meta-llama/Prompt-Guard-86M too, just for completeness, assuming they are under the same conditions.

@mradermacher Your access token should now also be able to download the Llama-Guard-3-8B and Prompt-Guard-86M models as @Guilherme34 requested and was granted access to them.

Won-der-ful! I was waiting for the guard models for quite a whole :)

Thanks, meta, for checking in two pickle versions of your 405B models, too (for a teensly-tiny 5TB download. ugh).

Anyways, thanks again to everybody here - it was a pleasure to see a bunch of people work together this quickly :) I'm so eager to find out whether llama.cpp can handle the 405B model or not.

@mradermacher the models need to be remade after recent code changes in transformer lib https://github.com/ggerganov/llama.cpp/issues/8650#issuecomment-2247595544

From what ppl are saying, it seems like the rope scaling (beyond 8k tokens) is still broken too.

@Green-Sky thanks - you don't happen to know which transformers release (if any) fixes this (the pretokenizer)? As for rope scaling, it's probably prudent to wait for problems to be sorted out. No need to be the first :)

mradermacher changed discussion status to open

https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/commit/339ce92d052f002cdbac4a4bd551d1c61dd8345e - was this a change to the llama-3.1 model repos?

You can use the access token from yesterday to git clone the repository and localy see the change. Execute git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct or GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct and use Guilherme34 as username and the Token as password.

@nicoboss can you answer my message in discord?

I did try this originally when trying to clone the repos and got a permission denied, but maybe I mistyped.

wait, that commit was for llama 3, not llama 3.1. is the same fix needed for llama 3 too?

The new model has a bunch of issues, so I'll wait for the next iteration. I'll hold off on taking action for 15 days. :~~~

This comment has been hidden

llama : add support for llama 3.1 rope scaling factors (#8676) fix released in b3472 3 minutes ago.

Finally (re-)queued everything. I converted a few other models first to see how it comes out, and it seems to work. Expect a busy night. Everything quanted in the last 10 hours or so should have the fixes.

mradermacher changed discussion status to closed

Sign up or log in to comment