You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry for duplication of the issue #1, but can you, please, explain how can I extract text for the single image given as input? It is not clear for me what steps I need to do to get text description of the single image.
Also, I was wondering if I can extract text for some external image, I mean for the image that was not included in the train and val image set?
I will really appreciate any help.
The text was updated successfully, but these errors were encountered:
Hi @yurii-piets
The code is to map the image or text input to one shared space. Therefore, given one image, we could extract the image embedding (feature). Given one sentence, we could extract the corresponding text embedding (feature).
More precisely, we extract the shared feature from image inputs, rather than text feature from image inputs.
Yes. You could extract the feature as well. But one thing that you should keep in mind is the data distribution of external images.
If the image is collected from Flickr, you should choose the model pretrained either on Flickr30k or MSCOCO.
If the content image is pedestrian, you should choose the model pretrained on CUHK-PEDES.
The model works well when the testing distribution is close to the training distribution.
Sorry for duplication of the issue #1, but can you, please, explain how can I extract text for the single image given as input? It is not clear for me what steps I need to do to get text description of the single image.
Also, I was wondering if I can extract text for some external image, I mean for the image that was not included in the train and val image set?
I will really appreciate any help.
The text was updated successfully, but these errors were encountered: