Clement commited on
Commit
b4a77d0
β€’
1 Parent(s): 7be3023

replace depth anything v1 with v2

Browse files
README.md CHANGED
@@ -1,98 +1,65 @@
1
  ---
2
- license: apache-2.0
3
- tags:
4
- - vision
 
5
  pipeline_tag: depth-estimation
6
- widget:
7
- - inference: false
 
 
8
  ---
9
 
10
- # Depth Anything (large-sized model, Transformers version)
11
-
12
- Depth Anything model. It was introduced in the paper [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://arxiv.org/abs/2401.10891) by Lihe Yang et al. and first released in [this repository](https://github.com/LiheYoung/Depth-Anything).
13
-
14
- [Online demo](https://huggingface.co/spaces/LiheYoung/Depth-Anything) is also provided.
15
-
16
- Disclaimer: The team releasing Depth Anything did not write a model card for this model so this model card has been written by the Hugging Face team.
17
-
18
- ## Model description
19
-
20
- Depth Anything leverages the [DPT](https://huggingface.co/docs/transformers/model_doc/dpt) architecture with a [DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2) backbone.
21
-
22
- The model is trained on ~62 million images, obtaining state-of-the-art results for both relative and absolute depth estimation.
23
-
24
- <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/depth_anything_overview.jpg"
25
- alt="drawing" width="600"/>
26
-
27
- <small> Depth Anything overview. Taken from the <a href="https://arxiv.org/abs/2401.10891">original paper</a>.</small>
28
 
29
- ## Intended uses & limitations
 
 
 
 
 
30
 
31
- You can use the raw model for tasks like zero-shot depth estimation. See the [model hub](https://huggingface.co/models?search=depth-anything) to look for
32
- other versions on a task that interests you.
33
 
34
- ### How to use
35
-
36
- Here is how to use this model to perform zero-shot depth estimation:
37
-
38
- ```python
39
- from transformers import pipeline
40
- from PIL import Image
41
- import requests
42
-
43
- # load pipe
44
- pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-large-hf")
45
-
46
- # load image
47
- url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
48
- image = Image.open(requests.get(url, stream=True).raw)
49
-
50
- # inference
51
- depth = pipe(image)["depth"]
52
  ```
53
 
54
- Alternatively, one can use the classes themselves:
 
 
55
 
56
  ```python
57
- from transformers import AutoImageProcessor, AutoModelForDepthEstimation
58
  import torch
59
- import numpy as np
60
- from PIL import Image
61
- import requests
62
-
63
- url = "http://images.cocodataset.org/val2017/000000039769.jpg"
64
- image = Image.open(requests.get(url, stream=True).raw)
65
-
66
- image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-large-hf")
67
- model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-large-hf")
68
 
69
- # prepare image for the model
70
- inputs = image_processor(images=image, return_tensors="pt")
71
 
72
- with torch.no_grad():
73
- outputs = model(**inputs)
74
- predicted_depth = outputs.predicted_depth
75
 
76
- # interpolate to original size
77
- prediction = torch.nn.functional.interpolate(
78
- predicted_depth.unsqueeze(1),
79
- size=image.size[::-1],
80
- mode="bicubic",
81
- align_corners=False,
82
- )
83
  ```
84
 
85
- For more code examples, we refer to the [documentation](https://huggingface.co/transformers/main/model_doc/depth_anything.html#).
86
 
87
- ### BibTeX entry and citation info
88
 
89
  ```bibtex
90
- @misc{yang2024depth,
91
- title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
92
- author={Lihe Yang and Bingyi Kang and Zilong Huang and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
93
- year={2024},
94
- eprint={2401.10891},
95
- archivePrefix={arXiv},
96
- primaryClass={cs.CV}
97
  }
98
- ```
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-nc-4.0
3
+
4
+ language:
5
+ - en
6
  pipeline_tag: depth-estimation
7
+ library_name: depth-anything-v2
8
+ tags:
9
+ - depth
10
+ - relative depth
11
  ---
12
 
13
+ # Depth-Anything-V2-Large
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
+ ## Introduction
16
+ Depth Anything V2 is trained from 595K synthetic labeled images and 62M+ real unlabeled images, providing the most capable monocular depth estimation (MDE) model with the following features:
17
+ - more fine-grained details than Depth Anything V1
18
+ - more robust than Depth Anything V1 and SD-based models (e.g., Marigold, Geowizard)
19
+ - more efficient (10x faster) and more lightweight than SD-based models
20
+ - impressive fine-tuned performance with our pre-trained models
21
 
22
+ ## Installation
 
23
 
24
+ ```bash
25
+ git clone https://huggingface.co/spaces/depth-anything/Depth-Anything-V2
26
+ cd Depth-Anything-V2
27
+ pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ```
29
 
30
+ ## Usage
31
+
32
+ Download the [model](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) first and put it under the `checkpoints` directory.
33
 
34
  ```python
35
+ import cv2
36
  import torch
 
 
 
 
 
 
 
 
 
37
 
38
+ from depth_anything_v2.dpt import DepthAnythingV2
 
39
 
40
+ model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024])
41
+ model.load_state_dict(torch.load('checkpoints/depth_anything_v2_vitl.pth', map_location='cpu'))
42
+ model.eval()
43
 
44
+ raw_img = cv2.imread('your/image/path')
45
+ depth = model.infer_image(raw_img) # HxW raw depth map
 
 
 
 
 
46
  ```
47
 
48
+ ## Citation
49
 
50
+ If you find this project useful, please consider citing:
51
 
52
  ```bibtex
53
+ @article{depth_anything_v2,
54
+ title={Depth Anything V2},
55
+ author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
56
+ journal={arXiv:2406.09414},
57
+ year={2024}
 
 
58
  }
59
+
60
+ @inproceedings{depth_anything_v1,
61
+ title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
62
+ author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
63
+ booktitle={CVPR},
64
+ year={2024}
65
+ }
depth_anything_v2_vitl.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7ea19fa0ed99244e67b624c72b8580b7e9553043245905be58796a608eb9345
3
+ size 1341395338
v1/README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - vision
5
+ pipeline_tag: depth-estimation
6
+ widget:
7
+ - inference: false
8
+ ---
9
+
10
+ # Depth Anything (large-sized model, Transformers version)
11
+
12
+ Depth Anything model. It was introduced in the paper [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://arxiv.org/abs/2401.10891) by Lihe Yang et al. and first released in [this repository](https://github.com/LiheYoung/Depth-Anything).
13
+
14
+ [Online demo](https://huggingface.co/spaces/LiheYoung/Depth-Anything) is also provided.
15
+
16
+ Disclaimer: The team releasing Depth Anything did not write a model card for this model so this model card has been written by the Hugging Face team.
17
+
18
+ ## Model description
19
+
20
+ Depth Anything leverages the [DPT](https://huggingface.co/docs/transformers/model_doc/dpt) architecture with a [DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2) backbone.
21
+
22
+ The model is trained on ~62 million images, obtaining state-of-the-art results for both relative and absolute depth estimation.
23
+
24
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/depth_anything_overview.jpg"
25
+ alt="drawing" width="600"/>
26
+
27
+ <small> Depth Anything overview. Taken from the <a href="https://arxiv.org/abs/2401.10891">original paper</a>.</small>
28
+
29
+ ## Intended uses & limitations
30
+
31
+ You can use the raw model for tasks like zero-shot depth estimation. See the [model hub](https://huggingface.co/models?search=depth-anything) to look for
32
+ other versions on a task that interests you.
33
+
34
+ ### How to use
35
+
36
+ Here is how to use this model to perform zero-shot depth estimation:
37
+
38
+ ```python
39
+ from transformers import pipeline
40
+ from PIL import Image
41
+ import requests
42
+
43
+ # load pipe
44
+ pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-large-hf")
45
+
46
+ # load image
47
+ url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
48
+ image = Image.open(requests.get(url, stream=True).raw)
49
+
50
+ # inference
51
+ depth = pipe(image)["depth"]
52
+ ```
53
+
54
+ Alternatively, one can use the classes themselves:
55
+
56
+ ```python
57
+ from transformers import AutoImageProcessor, AutoModelForDepthEstimation
58
+ import torch
59
+ import numpy as np
60
+ from PIL import Image
61
+ import requests
62
+
63
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
64
+ image = Image.open(requests.get(url, stream=True).raw)
65
+
66
+ image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-large-hf")
67
+ model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-large-hf")
68
+
69
+ # prepare image for the model
70
+ inputs = image_processor(images=image, return_tensors="pt")
71
+
72
+ with torch.no_grad():
73
+ outputs = model(**inputs)
74
+ predicted_depth = outputs.predicted_depth
75
+
76
+ # interpolate to original size
77
+ prediction = torch.nn.functional.interpolate(
78
+ predicted_depth.unsqueeze(1),
79
+ size=image.size[::-1],
80
+ mode="bicubic",
81
+ align_corners=False,
82
+ )
83
+ ```
84
+
85
+ For more code examples, we refer to the [documentation](https://huggingface.co/transformers/main/model_doc/depth_anything.html#).
86
+
87
+ ### BibTeX entry and citation info
88
+
89
+ ```bibtex
90
+ @misc{yang2024depth,
91
+ title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
92
+ author={Lihe Yang and Bingyi Kang and Zilong Huang and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
93
+ year={2024},
94
+ eprint={2401.10891},
95
+ archivePrefix={arXiv},
96
+ primaryClass={cs.CV}
97
+ }
98
+ ```
config.json β†’ v1/config.json RENAMED
File without changes
model.safetensors β†’ v1/model.safetensors RENAMED
File without changes
preprocessor_config.json β†’ v1/preprocessor_config.json RENAMED
File without changes