resolution limitations #23
Replies: 4 comments
-
It only works at 384x384, which is the resolution that its vision model, so400m, was trained at. I'll look into working on the vision model in future versions, but it's not my focus for version 1. |
Beta Was this translation helpful? Give feedback.
-
alright, well then the other question is: should it be stretched or should it be padded? the default script stretches, but some models work better when padded instead and I have some portrait images and some landscape images as well as just square ones (or close enough to square). |
Beta Was this translation helpful? Give feedback.
-
so400m was trained with stretched images, so that is likely to perform best. This is just a hobby project, so these are the limitations. GPT4o is a stronger model that can handle large images. |
Beta Was this translation helpful? Give feedback.
-
using this instead of gpt4o because price, local, and ability. I could tag with qwen2vl 73.4b, but it doesnt do nsfw easily. closedai costs too much to use the api for bulk tagging/describing. I guess I will be checking back for beta/release version when it comes out, hopefully its less limited in the future, until then I will probably be splitting images into sections with overlap to compensate or something along those lines. |
Beta Was this translation helpful? Give feedback.
-
what is the resolutions this can run at successfully? the base script downscales quite a bit which confuses content in images that are large due to amount of content. when I attempted to just double the default, I got some errors about mismatched tensor sizes (might have been my fault for setting it to only downscale to double instead of also downscaling to 384 if less than double already). is there a value I need to be aware of (ie: multiple of 34 or something?) or is this model trained exclusively at 384x384?
if 384 is only the default and not a requirement, then it would be nice to use higher resolutions to get more fidelity in the description.
Beta Was this translation helpful? Give feedback.
All reactions