Arthur Zucker's picture

Arthur Zucker

ArthurZ

·

AI & ML interests

None yet

Articles

Fixing Gradient Accumulation

Improving Hugging Face Training Efficiency Through Packing with Flash Attention

Fine-Tuning Gemma Models in Hugging Face

Code Llama: Llama 2 learns to code

Organizations

ArthurZ's activity

New activity in mistral-community/pixtral-12b 19 days ago

Update model weight

#13 opened 23 days ago by

New activity in mistral-community/pixtral-12b 22 days ago

Update hidden_act to silu

#14 opened 22 days ago by

New activity in rhymes-ai/Aria about 1 month ago

llama.cpp support

#1 opened about 1 month ago by

New activity in google/gemma-2-2b-jpn-it about 1 month ago

tokenizer_config.json is different from gemma-2-2b-it

#8 opened about 1 month ago by

New activity in mistral-community/pixtral-12b about 1 month ago

How can i use the full 24GB model instead of this separated safetensors files?

#8 opened about 2 months ago by

New activity in meta-llama/Llama-3.2-11B-Vision-Instruct about 2 months ago

hidden_activation vs hidden_act in config.json

#10 opened about 2 months ago by

New activity in mistral-community/pixtral-12b-240910 about 2 months ago

How to use safetensors?

#13 opened about 2 months ago by

New activity in mistral-community/pixtral-12b about 2 months ago

lamma cpp ht to gguf not working

#2 opened about 2 months ago by

New activity in meta-llama/Llama-3.1-405B-Instruct-FP8 3 months ago

8-kv-heads

#14 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-FP8 3 months ago

Update config.json

#17 opened 3 months ago by

Config KV Heads should be 8 now?

#16 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-Instruct-FP8 3 months ago

8 kv heads

#13 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-FP8 3 months ago

8-kv-heads

#15 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B 3 months ago

8-kv-heads

#21 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-Instruct 3 months ago

8-kv-heads

#17 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-FP8 3 months ago

Updated eos_token to include multiple IDs

#14 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-FP8 4 months ago

Update tokenizer to prepend special token

#12 opened 4 months ago by

New activity in meta-llama/Llama-3.1-70B 4 months ago

Update tokenizer to prepend special token

#11 opened 4 months ago by

New activity in meta-llama/Llama-3.1-8B-Instruct 4 months ago

Upload tokenizer

#29 opened 4 months ago by

Upload tokenizer

#28 opened 4 months ago by