Indian Food Classification with Vision Transformer (ViT)
Overview
This model is a fine-tuned Vision Transformer (ViT) for the task of classifying images of Indian foods. The model was trained on the Indian Foods Dataset from Hugging Face Datasets.
Dataset
The Indian Foods Dataset contains 4,770 images across 15 different classes of popular Indian dishes. The dataset is split into:
- Training: 3,047 images
- Validation: 762 images
- Testing: 961 images
Model
The base model used is the vision transformer (google/vit-base-patch16-224-in21k). The model was fine-tuned on the Indian Foods Dataset for 10 epochs using the AdamW optimizer with a learning rate of 2e-4.
Evaluation
The model was evaluated on the test set and achieved the following metrics:
- Accuracy: 0.9667
- Precision: 0.9670
- Recall: 0.9667
Usage
You can use this pre-trained model directly from Hugging Face
- Downloads last month
- 20
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.