Hui Sun's picture

Hui Sun

CocoSun

·

coco2sun

AI & ML interests

None yet

Organizations

CocoSun's activity

upvoted an article 20 days ago

Article

OCR Processing and Text in Image Analysis with Florence-2-base and Qwen2-VL-2B

By

•

22 days ago

• 12

upvoted a paper about 1 month ago

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27 • 89

upvoted a collection about 1 month ago

Molmo

Artifacts for open multimodal language models. • 5 items • Updated Sep 26 • 269

upvoted a collection about 2 months ago

Llama 3.2

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 17 days ago • 453

upvoted an article about 2 months ago

Article

Document Similarity Search with ColPali

By

•

Sep 21

• 46

upvoted a collection 2 months ago

AI Paper of the Day

A collection of papers that I think are interesting, one added each day • 213 items • Updated about 9 hours ago • 27

upvoted a paper 2 months ago

Foundation Models for Music: A Survey

Paper • 2408.14340 • Published Aug 26 • 39

upvoted 2 papers 3 months ago

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Paper • 2408.07931 • Published Aug 15 • 18

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 97

upvoted a collection 3 months ago

💻 Local SmolLMs

SmolLM models in MLC, ONNX and GGUF format for local applications + in-browser demos • 14 items • Updated Aug 20 • 44

upvoted 3 papers 3 months ago

Imagen 3

Paper • 2408.07009 • Published Aug 13 • 61

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

Paper • 2408.03361 • Published Aug 6 • 85

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Paper • 2408.02900 • Published Aug 6 • 25

upvoted a collection 3 months ago

Biomedical Vision-Language Models (VLMs)

Some of my favorite biomedical vision-language models • 15 items • Updated May 7 • 8

upvoted a paper 4 months ago

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10 • 27

upvoted a collection 4 months ago

Chronos Models & Datasets

Chronos: Pretrained (language) models for time series forecasting based on the T5 architecture. • 8 items • Updated Jun 27 • 29

upvoted 2 articles 4 months ago

Article

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Jul 23

• 213

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 206

upvoted 2 papers 4 months ago

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 66

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 82