BVRA
/

cermakvo commited on
Commit
33b3c6f
2 Parent(s): 63cbbff 651becc

Merge branch 'main' of https://huggingface.co/BVRA/wildlife-mega-L-384

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - ecology
5
+ - animals
6
+ - re-identification
7
+ library_name: wildlife-datasets
8
+ license: cc-by-nc-4.0
9
+ ---
10
+ # Model card for MegaDescriptor-L-384
11
+
12
+ A Swin-L image feature model. Superwisely pre-trained on animal re-identification datasets.
13
+
14
+
15
+ ## Model Details
16
+ - **Model Type:** Animal re-identification / feature backbone
17
+ - **Model Stats:**
18
+ - Params (M): 228.8
19
+ - Image size: 384 x 384
20
+ - Architecture: swin_large_patch4_window12_384
21
+ - **Paper:** [WildlifeDatasets_An_Open-Source_Toolkit_for_Animal_Re-Identification](https://openaccess.thecvf.com/content/WACV2024/html/Cermak_WildlifeDatasets_An_Open-Source_Toolkit_for_Animal_Re-Identification_WACV_2024_paper.html)
22
+ - **Related Papers:**
23
+ - [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030)
24
+ - [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/pdf/2304.07193.pdf)
25
+ - **Pretrain Dataset:** All available re-identification datasets --> https://github.com/WildlifeDatasets/wildlife-datasets
26
+
27
+ ## Model Usage
28
+ ### Image Embeddings
29
+ ```python
30
+
31
+ import timm
32
+ import torch
33
+ import torchvision.transforms as T
34
+
35
+ from PIL import Image
36
+ from urllib.request import urlopen
37
+
38
+ model = timm.create_model("hf-hub:BVRA/MegaDescriptor-L-384", pretrained=True)
39
+ model = model.eval()
40
+
41
+ train_transforms = T.Compose([T.Resize(size=(384, 384)),
42
+ T.ToTensor(),
43
+ T.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])])
44
+
45
+ img = Image.open(urlopen(
46
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
47
+ ))
48
+
49
+ output = model(train_transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
50
+ # output is a (1, num_features) shaped tensor
51
+ ```
52
+
53
+ ## Citation
54
+
55
+ ```bibtex
56
+ @inproceedings{vcermak2024wildlifedatasets,
57
+ title={WildlifeDatasets: An open-source toolkit for animal re-identification},
58
+ author={{\v{C}}erm{\'a}k, Vojt{\v{e}}ch and Picek, Lukas and Adam, Luk{\'a}{\v{s}} and Papafitsoros, Kostas},
59
+ booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
60
+ pages={5953--5963},
61
+ year={2024}
62
+ }
63
+ ```