Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Abstract
Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for instance in a language like HTML. Despite the advancements in VLMs for various tasks, the specific challenge of converting a screenshot into a corresponding HTML has been minimally explored. We posit that this is mainly due to the absence of a suitable, high-quality dataset. This work introduces WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and their corresponding screenshots. We fine-tune a foundational VLM on our dataset and show proficiency in converting webpage screenshots to functional HTML code. To accelerate the research in this area, we open-source WebSight.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Design2Code: How Far Are We From Automating Front-End Engineering? (2024)
- DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence (2024)
- Code Needs Comments: Enhancing Code LLMs with Comment Augmentation (2024)
- OMPGPT: A Generative Pre-trained Transformer Model for OpenMP (2024)
- Enhancing Vision-Language Pre-training with Rich Supervisions (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Congrats on the great work! Our arXiv paper https://arxiv.org/abs/2305.14637 is one of the earliest works addressing the same problem one year ago. Looking forward to more work on the topic!
Thanks
@zhoutianyi
for the reference, we indeed missed your paper
We’ll put it in the related work section if we edit this technical report after the next iteration!