|
--- |
|
library_name: keras-hub |
|
--- |
|
## Model Overview |
|
The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks. This model is supported in both KerasCV and KerasHub. KerasCV will no longer be actively developed, so please try to use KerasHub. |
|
|
|
|
|
|
|
## Links |
|
* [Segment Anything Quickstart Notebook: coming soon]() |
|
* [Segment Anything API Documentation](https://keras.io/api/keras_hub/models/sam/) |
|
* [Segment Anything Model Card](https://github.com/facebookresearch/segment-anything) |
|
* [Segment Anything paper](https://arxiv.org/abs/2304.02643) |
|
|
|
## Installation |
|
|
|
Keras and KerasHub can be installed with: |
|
|
|
``` |
|
pip install -U -q keras-Hub |
|
pip install -U -q keras>=3 |
|
``` |
|
|
|
Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page. |
|
|
|
## Presets |
|
|
|
The following model checkpoints are provided by the Keras team. Weights have been ported from https://dl.fbaipublicfiles.com/segment_anything/. Full code examples for each are available below. |
|
| Preset name | Parameters | Description | |
|
|----------------|------------|--------------------------------------------------| |
|
| sam_base_sa1b | 93.74M | The base SAM model trained on the SA1B dataset. | |
|
| sam_large_sa1b | 312.34M | The large SAM model trained on the SA1B dataset. | |
|
| sam_huge_sa1b | 641.09M | The huge SAM model trained on the SA1B dataset. | |
|
|
|
## Example Usage |
|
Load pretrained model using `from_preset`. |
|
|
|
```python |
|
image_size=1024 |
|
batch_size=2 |
|
input_data = { |
|
"images": np.ones( |
|
(batch_size, image_size, image_size, 3), |
|
dtype="float32", |
|
), |
|
"points": np.ones((batch_size, 1, 2), dtype="float32"), |
|
"labels": np.ones((batch_size, 1), dtype="float32"), |
|
"boxes": np.ones((batch_size, 1, 2, 2), dtype="float32"), |
|
"masks": np.zeros( |
|
(batch_size, 0, image_size, image_size, 1) |
|
), |
|
} |
|
sam = keras_hub.models.SAMImageSegmenter.from_preset('sam_base_sa1b') |
|
outputs = sam.predict(input_data) |
|
masks, iou_pred = outputs["masks"], outputs["iou_pred"] |
|
``` |
|
|
|
Load segment anything image segmenter with custom backbone |
|
|
|
```python |
|
image_size = 128 |
|
batch_size = 2 |
|
images = np.ones( |
|
(batch_size, image_size, image_size, 3), |
|
dtype="float32", |
|
) |
|
image_encoder = keras_hub.models.ViTDetBackbone( |
|
hidden_size=16, |
|
num_layers=16, |
|
intermediate_dim=16 * 4, |
|
num_heads=16, |
|
global_attention_layer_indices=[2, 5, 8, 11], |
|
patch_size=16, |
|
num_output_channels=8, |
|
window_size=2, |
|
image_shape=(image_size, image_size, 3), |
|
) |
|
prompt_encoder = keras_hub.layers.SAMPromptEncoder( |
|
hidden_size=8, |
|
image_embedding_size=(8, 8), |
|
input_image_size=( |
|
image_size, |
|
image_size, |
|
), |
|
mask_in_channels=16, |
|
) |
|
mask_decoder = keras_hub.layers.SAMMaskDecoder( |
|
num_layers=2, |
|
hidden_size=8, |
|
intermediate_dim=32, |
|
num_heads=8, |
|
embedding_dim=8, |
|
num_multimask_outputs=3, |
|
iou_head_depth=3, |
|
iou_head_hidden_dim=8, |
|
) |
|
backbone = keras_hub.models.SAMBackbone( |
|
image_encoder=image_encoder, |
|
prompt_encoder=prompt_encoder, |
|
mask_decoder=mask_decoder, |
|
) |
|
sam = keras_hub.models.SAMImageSegmenter( |
|
backbone=backbone |
|
) |
|
``` |
|
|
|
## Example Usage with Hugging Face URI |
|
|
|
Load pretrained model using `from_preset`. |
|
|
|
```python |
|
image_size=1024 |
|
batch_size=2 |
|
input_data = { |
|
"images": np.ones( |
|
(batch_size, image_size, image_size, 3), |
|
dtype="float32", |
|
), |
|
"points": np.ones((batch_size, 1, 2), dtype="float32"), |
|
"labels": np.ones((batch_size, 1), dtype="float32"), |
|
"boxes": np.ones((batch_size, 1, 2, 2), dtype="float32"), |
|
"masks": np.zeros( |
|
(batch_size, 0, image_size, image_size, 1) |
|
), |
|
} |
|
sam = keras_hub.models.SAMImageSegmenter.from_preset('sam_base_sa1b') |
|
outputs = sam.predict(input_data) |
|
masks, iou_pred = outputs["masks"], outputs["iou_pred"] |
|
``` |
|
|
|
Load segment anything image segmenter with custom backbone |
|
|
|
```python |
|
image_size = 128 |
|
batch_size = 2 |
|
images = np.ones( |
|
(batch_size, image_size, image_size, 3), |
|
dtype="float32", |
|
) |
|
image_encoder = keras_hub.models.ViTDetBackbone( |
|
hidden_size=16, |
|
num_layers=16, |
|
intermediate_dim=16 * 4, |
|
num_heads=16, |
|
global_attention_layer_indices=[2, 5, 8, 11], |
|
patch_size=16, |
|
num_output_channels=8, |
|
window_size=2, |
|
image_shape=(image_size, image_size, 3), |
|
) |
|
prompt_encoder = keras_hub.layers.SAMPromptEncoder( |
|
hidden_size=8, |
|
image_embedding_size=(8, 8), |
|
input_image_size=( |
|
image_size, |
|
image_size, |
|
), |
|
mask_in_channels=16, |
|
) |
|
mask_decoder = keras_hub.layers.SAMMaskDecoder( |
|
num_layers=2, |
|
hidden_size=8, |
|
intermediate_dim=32, |
|
num_heads=8, |
|
embedding_dim=8, |
|
num_multimask_outputs=3, |
|
iou_head_depth=3, |
|
iou_head_hidden_dim=8, |
|
) |
|
backbone = keras_hub.models.SAMBackbone( |
|
image_encoder=image_encoder, |
|
prompt_encoder=prompt_encoder, |
|
mask_decoder=mask_decoder, |
|
) |
|
sam = keras_hub.models.SAMImageSegmenter( |
|
backbone=backbone |
|
) |
|
``` |
|
|