Welcome to the Community Computer Vision Course
Dear learner,
Welcome to the community-driven course on computer vision. Computer vision is revolutionizing our world in many ways, from unlocking phones with facial recognition to analyzing medical images for disease detection, monitoring wildlife, and creating new images. Together, we’ll dive into the fascinating world of computer vision!
Throughout this course, we’ll cover everything from the basics to the latest advancements in computer vision. It’s structured to include various foundational topics, making it friendly and accessible for everyone. We’re delighted to have you join us for this exciting journey!
On this page, you can find how to join the learners community, make a submission and get a certificate, and more details about the course!
Assignment 📄
To obtain your certification for completing the course, complete the following assignments:
- Training/fine-tuning a Model
- Building an application and hosting it on Hugging Face Spaces
Training/fine-tuning a Model
There are notebooks under the Notebooks/Vision Transformers section. As of now, we have notebooks for object detection, image segmentation, and image classification. You can either train a model on a dataset that exists on 🤗 Hub or upload a dataset to a dataset repository and train a model on that.
The model repository needs to have the following:
- A properly filled model card, you can check out here for more information
- If you trained a model with transformers and pushed it to Hub, the model card will be generated. In that case, edit the card and fill in more details.
- Add the dataset’s ID to the model card to link the model repository to the dataset repository.
Creating a Space
In this assignment section, you’ll be building a Gradio-based application for your computer vision model and sharing it on 🤗 Spaces. Learn more about these tasks using the following resources:
Certification 🥇
Once you’ve finished the assignments — Training/fine-tuning a Model and Creating a Space — please complete the form with your name, email, and links to your model and Space repositories to receive your certificate
Join the community!
We invite you to be a part of our active and supportive Discord community, where engaging conversations and shared interests flourish every day and where this course started. You will find peers with whom you can exchange ideas and resources. It is your source to collaborate, get feedback, and ask questions!
It is also a good way to motivate yourself to follow the course. Joining our community is an excellent way to stay engaged. Who knows what is the next thing we will build together?
As AI continues to advance, so does the quality of our discussions and the diversity of perspectives within our community. Upon becoming a member, you’ll have an opportunity to connect with fellow course participants, exchange ideas, and collaborate with others. Moreover, the contributors to this course are active on Discord and might help you when needed. Join us now!
Computer Vision Channels
There are many channels focused on various topics on our Discord server. You will find people discussing papers, organizing events, sharing their projects and ideas, brainstorming, and so much more.
As a computer vision course learner, you may find the following set of channels particularly relevant:
#computer-vision
: a catch-all channel for everything related to computer vision.#cv-study-group
: a place to exchange ideas, ask questions about specific posts and start discussions.#3d
: a channel to discuss aspects of computer vision specific to 3D computer vision
If you are interested in generative AI, we also invite you to join all channels related to the Diffusion Models: #core-announcements, #discussions, #dev-discussions, and #diff-i-made-this.
What you will learn
The course is composed of theory, practical tutorials, and engaging challenges.
- Theory Part : This section covers the theoretical principles of computer vision, explained in detail with practical examples.
- Hands-on Tutorials : You will learn how to train and apply key computer vision models using Google Colab notebooks.
Throughout this course, we will cover everything from the basics to the latest advancements in computer vision. It is structured to include various foundational topics, giving you a comprehensive understanding of what makes computer vision so impactful today.
Pre-requisites
Before beginning this course, make sure that you have some experience with Python programming and are familiar with transformers, machine learning, and neural networks. If these are new to you, consider reviewing the first unit of the Hugging Face NLP course. While a strong knowledge of pre-processing techniques and mathematical operations like convolutions is beneficial, they are not prerequisites.
Course Structure
The course is organized into multiple units, covering the fundamentals and delving into an in-depth exploration of state-of-the-art models.
- Unit 1 - Fundamentals of Computer Vision : this unit covers the essential concepts to get started with computer vision: the need for computer vision, the field’s basics, and its applications. Explore image fundamentals, formation, and preprocessing, along with key aspects of feature extraction.
- Unit 2 - Convolutional Neural Networks (CNNs) : delve into the world of CNNs, understanding their general architecture, key concepts, and common pre-trained models. Learn how to apply transfer learning and fine-tuning to adapt CNNs for various tasks.
- Unit 3 - Vision Transformers : explore transformer architecture in the context of computer vision and learn how they compare to CNNs. Understand common vision transformers such as Swin, DETR, and CVT, along with techniques for transfer learning and fine-tuning.
- Unit 4 - Multimodal Models : understand the fusion of text and vision by exploring multimodal tasks like image-to-text and text-to-image. Study models such as CLIP and its relatives (GroupViT, BLIPM, Owl-VIT), and master transfer learning techniques for multimodal tasks.
- Unit 5 - Generative Models : explore generative models, including GANs, VAEs, and diffusion models. Learn about their differences and applications in tasks such as text-to-image, image-to-image, and inpainting.
- Unit 6 - Basic Computer Vision Tasks : cover fundamental tasks like image classification, object detection, and segmentation and the models used in them (YOLO, SAM). Gain insights into metrics and practical applications for these tasks.
- Unit 7 - Video and Video Processing : examine the characteristics of videos, the role of video processing, and the challenges compared to image processing. Explore temporal continuity, motion estimation, and practical applications in video processing.
- Unit 8 - 3D Vision, Scene Rendering, and Reconstruction : delve into the complexities of three-dimensional vision, exploring concepts like Nerf and GQN for scene rendering and reconstruction. Understand the challenges and applications of 3D vision in computer vision, and how it provides an even more comprehensive view of spatial information.
- Unit 9 - Model Optimization : explore the critical aspects of model optimization. Cover techniques such as model compression, deployment considerations, and the usage of tools and frameworks. Include topics topics like distillation, pruning, and TinyML for efficient model deployment.
- Unit 10 - Synthetic Data Creation : discover the importance of synthetic data creation using deep generative models. Explore methods like point clouds and diffusion models and investigate major synthetic datasets and their applications in computer vision.
- Unit 11 - Zero Shot Computer Vision : delve into the realm of zero-shot learning in computer vision, covering aspects of generalization, transfer learning, and its applications in tasks such as zero-shot recognition and image segmentation. Explore the relationship between zero-shot learning and transfer learning across various computer vision domains.
- Unit 12 - Ethics and Biases in Computer Vision : understand the ethical considerations specific to computer vision. Explore why ethics matter, how biases can infiltrate AI models, and the types of biases prevalent in these domains. Learn how to do bias evaluation and mitigation strategies, emphasizing responsible development and deployment of AI technologies.
- Unit 13 - Outlook and Emerging Trends : explore current trends and emerging architectures . Delve into innovative approaches like Retentive Network, Hiera, Hyena, I-JEPA, and Retention Vision Models.
Meet our team
This course is made by the Hugging Face Community with love 💜! Join us by adding your contribution on GitHub. Our goal was to create a computer vision course that is beginner-friendly and that could act as a resource for others. Around 60+ people from all over the world joined forces to make this project happen. Here we give them credit:
Unit 1 - Fundamentals of Computer Vision
- Reviewers: Ratan Prasad, Ameed Taylor
- Writers: Seshu Pavan Mutyala, Isabella Bicalho-Frazeto, Aman Kapoor, Tiago Comassetto Fróes, Aditya Mishra, Kerem Delikoyun, Ker Lee Yap, Kathy Fahnline, Ameed Taylor
Unit 2 - Convolutional Neural Networks (CNNs)
- Reviewers: Ratan Prasad, Mohammed Hamdy, Sezan, Joshua Adrian Cahyono, Murtaza Nazir, Albert Kao, Sitam Meur, Antonis Stellas
- Writers: Emre Albayrak, Caroline Shamiso Chitongo, Sezan, Joshua Adrian Cahyono, Murtaza Nazir, Albert Kao, Isabella Bicalho-Frazeto, Aman Kapoor, Sitam Meur
Unit 3 - Vision Transformers
- Reviewers: Ratan Prasad, Mohammed Hamdy, Ameed Taylor, Sezan
- Writers: Surya Guthikonda, Ker Lee Yap, Anindyadeep Sannigrahi, Celina Hanouti, Malcolm Krolick, Alvin Li, Shreyas Daniel Gaddam, Anthony Susevski, Alan Ahmet
Unit 4 - Multimodal Models
- Reviewers: Ratan Prasad, Snehil Sanyal, Mohammed Hamdy, Charchit Sharma, Ameed Taylor, Isabella Bicalho-Frazeto
- Writers: Snehil Sanyal, Surya Guthikonda, Mateusz Dziemian, Charchit Sharma, Evstifeev Stepan, Jeremy Kespite, Isabella Bicalho-Frazeto, Pedro Gabriel Gengo Lourenco
Unit 5 - Generative Models
- Reviewers: Ratan Prasad, William Bonvini, Mohammed Hamdy, Ameed Taylor-
- Writers: Jeronim Matijević, Mateusz Dziemian, Charchit Sharma, Muhammad Waseem
Unit 6 - Basic Computer Vision Tasks
- Reviewers: Adhi Setiawan
- Writers: Adhi Setiawan, Bastien Pouëssel
Unit 7 - Video and Video Processing
- Reviewers: Ameed Taylor
- Writers: Diwakar Basnet
Unit 8 - 3D Vision, Scene Rendering, and Reconstruction
- Reviewers: Ratan Prasad, William Bonvini, Mohammed Hamdy, Adhi Setiawan, Ameed Taylor
- Writers: John Fozard, Vasu Gupta, Psetinek
Unit 9 - Model Optimization
- Reviewers: Ratan Prasad, Mohammed Hamdy, Adhi Setiawan, Ameed Taylor
- Writer: Adhi Setiawan
Unit 10 - Synthetic Data Creation
- Reviewers: Mohammed Hamdy, Ameed Taylor, Bhavesh Misra
- Writers: William Bonvini, Alper Balbay, Madhav Kumar, Bhavesh Misra, Kathy Fahnline
Unit 11 - Zero Shot Computer Vision
- Reviewers: Ratan Prasad, Mohammed Hamdy, Albert Kao, Isabella Bicalho-Frazeto
- Writers: Mohammed Hamdy, Albert Kao
Unit 12 - Ethics and Biases in Computer Vision
- Reviewers: Ratan Prasad, Mohammed Hamdy, Charchit Sharma, Adhi Setiawan, Ameed Taylor, Bhavesh Misra
- Writers: Snehil Sanyal, Bhavesh Misra
Unit 13 - Outlook and Emerging Trends
- Reviewers: Ratan Prasad, Ameed Taylor, Mohammed Hamdy
- Writers: Farros Alferro, Mohammed Hamdy, Louis Ulmer, Dario Wisznewer, gonzachiar
Organisation Team Merve Noyan, Adam Molnar, Johannes Kolbe
We are happy to have you here, let’s get started!
< > Update on GitHub