44 28 17

Pablo Montalvo PRO

Molbap

molbap

AI & ML interests

None yet

Articles

Introducing TextImage Augmentation for Document Images

Aug 6

• 30

Organizations

Molbap's activity

upvoted 2 articles 3 months ago

Article

Introducing TextImage Augmentation for Document Images

Aug 6

• 30

Article

MobileNet Baselines

•

Jul 26

• 23

upvoted 2 articles 4 months ago

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25

• 18

Article

Mixture of Experts Explained

Dec 11, 2023

• 183

upvoted a paper 4 months ago

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 66

upvoted 2 collections 5 months ago

Searching for Better ViT Baselines

Collection

Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 25 items • Updated Aug 21 • 12

MobileNetV4 pretrained weights

Collection

Weights for MobileNet-V4 pretrained in timm • 17 items • Updated Sep 22 • 17

upvoted 3 papers 5 months ago

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Paper • 2406.11271 • Published Jun 17 • 18

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12 • 39

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28 • 12

upvoted a paper 6 months ago

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published May 24 • 43

upvoted 4 articles 6 months ago

Article

AI has a problem with objectifying women

•

May 24

• 55

Article

MobileNet-V4 (now in timm)

•

Jun 17

• 39

Article

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

•

May 16

• 17

Article

License to Call: Introducing Transformers Agents 2.0

May 13

• 116

upvoted a collection 6 months ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 137

upvoted 2 articles 6 months ago

Article

2024-04-22 - Hub Incident Post Mortem

•

May 17

• 17

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 206

upvoted a paper 7 months ago

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9 • 29

upvoted a paper 9 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 602