Spaces:
Running
Questions about the model
Hi, I have read the paper, good work, and i think to try the code, but for simplicity, do you think to expose here the running demo?
Thanks in advance Paolo
Hi, thanks for your interest and the heads up about the demo !
There was a build error, it should be up an running again now :)
Hi guys ... sure i'm interested. Good paper , in any case I'll try to run it , as you indicated in the project page:
I have 2 small questions about it : a) do u have a checkpoint of the training? because for me it's a bit difficult to retrain, I use an academic server very busy b) a silly question about the test: why to embed and after evaluate the similarity? " 2 separate steps. There is some reasons about architecture?
Thanks in advance. Paolo
a) Yes, there is a checkpoint of a Categorical CondViT-B/16 in the demo files (artifacts/cat_condvit_b16.pth
). If you look at the demo code you should easily be able to use it.
b) The main reason is that computing the embeddings can take a while when you have millions of images, and I don’t want to do it multiple times. So I precompute the embeddings once for the whole test set (which takes a while), and store them. Then I can load parts of the test set, compute multiple metrics, use it as an index for a demo, or whatever, and it takes only a few seconds each time with optimized libraries like FAISS.
I hope it answers your questions.
Hi
a) Good. I didn't find that (i have read quickly. sorry)
b) yes, obvious, all embedded data once, it has been a stupid question, i must go dive in the code :). Yes thank for fully info. i'll keep you updated. Good day P
Hi Simon...Sorry, another question: the checkpoint used in the demo gradio is "./artifacts/cat_condvit_b16.pth", but I have found also B32_Params used for training in the python class of the main repository (I didn't find the related checkpoint, is it non public?). I don't know if the B32 model has more precision (the img have better quality), large training set or both?
Thks in advance. Paolo
Hi,
I didn't release the checkpoints for B/32 models, as I used this size mostly for ablations studies. Both B/32 and B/16 work on 224x224 images, but B/32 works on a coarser representation (larger patches) so it performs worse (cf Table 1 in the paper). Otherwise they have the same number of parameters and are trained on the same dataset.
Thanks for details Simon, I look the large patches. Very useful your answers, i hope give you something of useful. If you want ask me, I'm experience in DL and ML, more in NLP . good day P