FiftyOne
FiftyOne is an open-source toolkit for curating, visualizing, and managing unstructured visual data. The library streamlines data-centric workflows, from finding low-confidence predictions to identifying poor-quality samples and uncovering hidden patterns in your data. The library supports all sorts of visual data, from images and videos to PDFs, point clouds, and meshes.
FiftyOne accommodates object detections, keypoints, polylines, and custom schemas.
FiftyOne is integrated with the Hugging Face Hub so that you can load and share FiftyOne datasets directly from the Hub.
🚀 Try the FiftyOne 🤝 Hugging Face Integration in Colab!
Prerequisites
First login with your Hugging Face account:
huggingface-cli login
Make sure you have fiftyone>=0.24.0
installed:
pip install -U fiftyone
Loading Visual Datasets from the Hub
With load_from_hub()
from FiftyOne’s Hugging Face utils, you can load:
- Any FiftyOne dataset uploaded to the hub
- Most image-based datasets stored in Parquet files (which is the standard for datasets uploaded to the hub via the
datasets
library)
Loading FiftyOne datasets from the Hub
Any dataset pushed to the hub in one of FiftyOne’s supported common formats
should have all of the necessary configuration info in its dataset repo on the
hub, so you can load the dataset by specifying its repo_id
. As an example, to
load the VisDrone detection dataset:
import fiftyone as fo
from fiftyone.utils import load_from_hub
## load from the hub
dataset = load_from_hub("Voxel51/VisDrone2019-DET")
## visualize in app
session = fo.launch_app(dataset)
You can customize the download process, including the number of samples to download, the name of the created dataset object, or whether or not it is persisted to disk.
You can list all the available FiftyOne datasets on the Hub using:
from huggingface_hub import HfApi
api = HfApi()
api.list_datasets(tags="fiftyone")
Loading Parquet Datasets from the Hub with FiftyOne
You can also use the load_from_hub()
function to load datasets from Parquet
files. Type conversions are handled for you, and images are downloaded from URLs
if necessary.
With this functionality, you can load any of the following:
- FiftyOne-Compatible Image Classification Datasets, like Food101 and ImageNet-Sketch
- FiftyOne-Compatible Object Detection Datasets like CPPE-5 and WIDER FACE
- FiftyOne-Compatible Segmentation Datasets like SceneParse150 and Sidewalk Semantic
- FiftyOne-Compatible Image Captioning Datasets like COYO-700M and New Yorker Caption Contest
- FiftyOne-Compatible Visual Question-Answering Datasets like TextVQA and ScienceQA
As an example, we can load the first 1,000 samples from the WikiArt dataset into FiftyOne with:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
dataset = load_from_hub(
"huggan/wikiart", ## repo_id
format="parquet", ## for Parquet format
classification_fields=["artist", "style", "genre"], ## columns to treat as classification labels
max_samples=1000, # number of samples to load
name="wikiart", # name of the dataset in FiftyOne
)
Pushing FiftyOne Datasets to the Hub
You can push a dataset to the hub with:
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone.utils.huggingface import push_to_hub
## load example dataset
dataset = foz.load_zoo_dataset("quickstart")
## push to hub
push_to_hub(dataset, "my-hf-dataset")
When you call push_to_hub()
, the dataset will be uploaded to the repo
with the specified repo name under your username, and the repo will be created
if necessary. A Dataset Card will automatically be generated and populated with instructions for loading the dataset from the hub. You can upload a thumbnail image/gif to appear on the Dataset Card with the preview_path
argument.
Here’s an example using many of these arguments, which would upload the first three samples of FiftyOne’s Quickstart Video dataset to the private repo username/my-quickstart-video-dataset
with tags, an MIT license, a description, and a preview image:
dataset = foz.load_from_zoo("quickstart-video", max_samples=3)
push_to_hub(
dataset,
"my-quickstart-video-dataset",
tags=["video", "tracking"],
license="mit",
description="A dataset of video samples for tracking tasks",
private=True,
preview_path="<path/to/preview.png>"
)
📚 Resources
- 🚀 Code-Along Colab Notebook
- 🗺️ User Guide for FiftyOne Datasets
- 🤗 FiftyOne 🤝 Hub Integration Docs
- 🤗 FiftyOne 🤝 Transformers Integration Docs
- 🧩 FiftyOne Hugging Face Hub Plugin