5 5 45

Maxi PRO

maxiw

maxi-w

AI & ML interests

GUI Agents | VLMs

Organizations

Posts 2

Post

1761

The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.

You can try it out with my space maxiw/Qwen2-VL-Detection

Post

2085

Just added the newly released xGen-MM v1.5 foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research to my xGen-MM HF Space maxiw/XGen-MM

Collections 2

XGen MM

models 1

maxiw/Florence-2-ScreenQA-base

Image-Text-to-Text • Updated 29 days ago • 47 • 2

datasets

None public yet

Maxi PRO

AI & ML interests

Organizations

Posts 2

Collections 2

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

CogAgent: A Visual Language Model for GUI Agents

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

shuaishuaicdp/GUI-World

agentsea/wave-ui-25k

rootsautomation/ScreenSpot

rootsautomation/RICO-ScreenAnnotation

spaces 5

HTML To Markdown

Qwen2 VL Localization

Florence 2 ScreenQA

Phi 3.5 Vision

XGen MM

models 1

maxiw/Florence-2-ScreenQA-base

datasets

Maxi PRO

AI & ML interests

Organizations

Posts 2

Collections 2

spaces 5 Sort: Recently updated

HTML To Markdown

Qwen2 VL Localization

Florence 2 ScreenQA

Phi 3.5 Vision

XGen MM

models 1

datasets

spaces 5