the model "depth-anything/Depth-Anything-V2-Metric-Outdoor-Small-hf" doesnt work as expected, produces a full black depthmap

by Abbasid - opened Aug 15

Discussion

Abbasid

Aug 15

hi so im trying to use the model as described here, but i keep getting just blank pure black depth maps

when I go back to the original models "depth-anything/Depth-Anything-V2-Small-hf" I get an accurate depthmap

I tried inferring the model both using the high level pipeline API and the manual way, and the result is the same.

what could be the issue?

bthia97

about 1 month ago

•

edited about 1 month ago

Hi @Abbasid , it looks like the feature "metric depth" for DepthAnything is not in the 4.44.0 release yet, but you can use it if you update transformers to the latest main as follows:

# update transformers to latest main
!pip install git+https://github.com/huggingface/transformers

from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("depth-anything/depth-anything-V2-metric-outdoor-small-hf")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/depth-anything-V2-metric-outdoor-small-hf")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
) 

# visualize the output
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

depth

I believe this is the issue you are facing because it is working for me:

Abbasid

24 days ago

Hello @bthia97
Thanks I managed to get it working with the help of
!pip install git+https://github.com/huggingface/transformers
another question these values are supposed to represent metric depth right? so if at a pixel I got the value 60,
that means this object is 60m ? cause I got way off values for my objects of known sizes,
or is something wrong with my understanding?

Abbasid

17 days ago

also, when we infer with pipe whats the difference between depth and predicted depth?
depth is a PIL image with the depth...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment