LLaVA-Video Collection Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 6 items • Updated Oct 5 • 53
Robust Speech Recognition via Large-Scale Weak Supervision Paper • 2212.04356 • Published Dec 6, 2022 • 23
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 17 days ago • 453
Sapiens Collection Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens • 72 items • Updated Sep 18 • 44
LLaVa-Interleave Collection LLaVa models that extends the model capabilities to Multi-image, Multi-frame (videos), Multi-patch (single-image) scenarios. • 3 items • Updated Jul 10 • 14