Model card for resnet50-truncated.tv_in1k

A truncated ResNet-50 feature extraction model, as used in CLAM.

This model features:

ReLU activations
single layer 7x7 convolution with pooling
1x1 convolution shortcut downsample

Trained on ImageNet-1k, original torchvision model weight, truncated to exclude layer 4 and the fully connected layer. This uses trained weights distributed via PyTorch.

This model card was adapted from https://huggingface.co/timm/resnet50.tv_in1k.

Model Details

Model Type: Feature backbone
Model Stats:
- Params (M): 8.5
- Image size: 224 x 224
Papers:
- Deep Residual Learning for Image Recognition: https://arxiv.org/abs/1512.03385
- Data-efficient and weakly supervised computational pathology on whole-slide images: https://www.nature.com/articles/s41551-020-00682-w
Original: https://github.com/pytorch/vision

Model Creation

import types
import torch
from torchvision.models import ResNet
from torchvision.models import resnet50

def _forward_impl(self, x: torch.Tensor) -> torch.Tensor:
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)

    x = self.avgpool(x)
    x = x.view(x.size(0), -1)

    return x

model = resnet50(weights=None)
del model.layer4, model.fc

model._forward_impl = types.MethodType(_forward_impl, model)

state_dict = torch.hub.load_state_dict_from_url(
    "https://download.pytorch.org/models/resnet50-19c8e357.pth"
)
# Remove truncated keys.
state_dict = {k: v for k, v in state_dict.items() if not k.startswith("layer4.") and not k.startswith("fc.")}

model.load_state_dict(state_dict, strict=True)
model.eval()

Model Usage

Image Embeddings

from urllib.request import urlopen
from PIL import Image
import torch

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

# See above for how to load the model. Or load a TorchScript version of the model,
# which can be loaded automatically with
# model = torch.jit.load("torchscript_model.bin")
model = model.eval()

transform = transforms.Compose([
    # Depending on the pipeline, this may be 256x256 or a different value.
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=(0.485, 0.456, 0.406),
        std=(0.229, 0.224, 0.225)),
])

with torch.no_grad():
    output = model(transform(img).unsqueeze(0))  # unsqueeze single image into batch of 1
output.shape  # 1x1024

Citation

@article{He2015,
  author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
  title = {Deep Residual Learning for Image Recognition},
  journal = {arXiv preprint arXiv:1512.03385},
  year = {2015}
}
@article{lu2021data,
  title={Data-efficient and weakly supervised computational pathology on whole-slide images},
  author={Lu, Ming Y and Williamson, Drew FK and Chen, Tiffany Y and Chen, Richard J and Barbieri, Matteo and Mahmood, Faisal},
  journal={Nature Biomedical Engineering},
  volume={5},
  number={6},
  pages={555--570},
  year={2021},
  publisher={Nature Publishing Group}
}

kaczmarj
/

resnet50-truncated.tv_in1k