Has anyone tried adding positional embeddings to the image patches to improve the model?

#70
by jchiu1234 - opened

Was thinking about trying to get very specific location information from the model. Has anyone tried this yet?

Yeah, I have tried some form of it. I'm not sure if it will help (or didn't in my case) unless you have a large and diverse dataset to then train further with.

The base model seems really hit or miss with localization (meaning I will see it outperform other OCR tools on one sample but the next sample it has almost nil ability) and does not seem to train well for any downstream tasks that require localization (via box or point tags).

@besiktas any recommendations for other models that can achieve this?

Sign up or log in to comment