Model name: mobilenet_v3_small_100_224
Description adapted from TFHub
Overview
MobileNet V3 is a family of neural network architectures for efficient on-device image classification and related tasks, originally published by
- Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam: "Searching for MobileNetV3", 2019.
Similar to other Mobilenets, MobileNet V3 uses a multiplier for the depth (number of features) in the convolutional layers to tune the accuracy vs. latency tradeoff. In addition, MobileNet V3 comes in two different sizes, small and large, to adapt the network to low or high resource use cases. Although V3 networks can be built with custom input resolutions, just like other Mobilenets, all pre-trained checkpoints were published with the same 224x224 input resolution.
For a quick comparison between these variants, please refer to the following table:
Size | Depth multiplier | Top1 accuracy (%) | Pixel 1 (ms) | Pixel 2 (ms) | Pixel 3 (ms) |
---|---|---|---|---|---|
Large | 1.0 | 75.2 | 51.2 | 61 | 44 |
Large | 0.75 | 73.3 | 39.8 | 48 | 32 |
Small | 1.0 | 67.5 | 15.8 | 19.4 | 14.4 |
Small | 0.75 | 65.4 | 12.8 | 15.9 | 11.6 |
This model uses the TF-Slim implementation of mobilenet_v3
as a small network with a depth multiplier of 1.0.
The model contains a trained instance of the network, packaged to do the image classification that the network was trained on. If you merely want to transform images into feature vectors, use google/imagenet/mobilenet_v3_small_100_224/feature_vector/5
instead, and save the space occupied by the classification layer.
Training
The checkpoint exported into this model was v3-small_224_1.0_float/ema/model-388500
downloaded from MobileNet V3 pre-trained models. Its weights were originally obtained by training on the ILSVRC-2012-CLS dataset for image classification ("Imagenet").
Usage
This model can be used with the hub.KerasLayer
as follows. It cannot be used with the hub.Module
API for TensorFlow 1.
Using TF Hub and HF Hub
model_path = snapshot_download(repo_id="Dimitre/mobilenet_v3_small")
model = KerasLayer(handle=model_path)
img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)
Using TF Hub fork
model = pull_from_hub(repo_id="Dimitre/mobilenet_v3_small")
img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)
The output is a batch of logits vectors. The indices into the logits are the num_classes
= 1001 classes of the classification from the original training (see above). The mapping from indices to class labels can be found in the file at download.tensorflow.org/data/ImageNetLabels.txt (with class 0 for "background", followed by 1000 actual ImageNet classes).
The input images are expected to have color values in the range [0,1], following the common image input conventions. For this model, the size of the input images is fixed to height
x width
= 224 x 224 pixels.
Fine-tuning
In principle, consumers of this model can fine-tune it by passing trainable=True
to hub.KerasLayer
.
However, fine-tuning through a large classification might be prone to overfit.
The momentum (a.k.a. decay coefficient) of batch norm's exponential moving averages defaults to 0.99 for this model, in order to accelerate training on small datasets (or with huge batch sizes).
Using TF Hub and HF Hub
model_path = snapshot_download(repo_id="Dimitre/mobilenet_v3_small")
model = KerasLayer(handle=model_path, trainable=True)
img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)
Using TF Hub fork
model = pull_from_hub(repo_id="Dimitre/mobilenet_v3_small", trainable=True)
img = np.random.rand(1, 224, 224, 3) # (batch_size, height, width, num_channels)
model(img) # output shape (1, 1001)