resolution limitations #23

yggdrasil75 · 2025-01-01T20:39:32Z

yggdrasil75
Jan 1, 2025

what is the resolutions this can run at successfully? the base script downscales quite a bit which confuses content in images that are large due to amount of content. when I attempted to just double the default, I got some errors about mismatched tensor sizes (might have been my fault for setting it to only downscale to double instead of also downscaling to 384 if less than double already). is there a value I need to be aware of (ie: multiple of 34 or something?) or is this model trained exclusively at 384x384?
if 384 is only the default and not a requirement, then it would be nice to use higher resolutions to get more fidelity in the description.

fpgaminer · 2025-01-01T20:42:11Z

fpgaminer
Jan 1, 2025
Maintainer

It only works at 384x384, which is the resolution that its vision model, so400m, was trained at. I'll look into working on the vision model in future versions, but it's not my focus for version 1.

0 replies

yggdrasil75 · 2025-01-01T20:53:00Z

yggdrasil75
Jan 1, 2025
Author

alright, well then the other question is: should it be stretched or should it be padded? the default script stretches, but some models work better when padded instead and I have some portrait images and some landscape images as well as just square ones (or close enough to square).
also what is the best option for larger images with lots of content in them if they will be limited to 384?

0 replies

fpgaminer · 2025-01-01T20:55:01Z

fpgaminer
Jan 1, 2025
Maintainer

so400m was trained with stretched images, so that is likely to perform best.

This is just a hobby project, so these are the limitations. GPT4o is a stronger model that can handle large images.

0 replies

yggdrasil75 · 2025-01-01T21:00:42Z

yggdrasil75
Jan 1, 2025
Author

using this instead of gpt4o because price, local, and ability. I could tag with qwen2vl 73.4b, but it doesnt do nsfw easily. closedai costs too much to use the api for bulk tagging/describing. I guess I will be checking back for beta/release version when it comes out, hopefully its less limited in the future, until then I will probably be splitting images into sections with overlap to compensate or something along those lines.
or I guess manually tagging some stuff. but that sounds boring.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolution limitations #23

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

resolution limitations #23

yggdrasil75 Jan 1, 2025

Replies: 4 comments

fpgaminer Jan 1, 2025 Maintainer

yggdrasil75 Jan 1, 2025 Author

fpgaminer Jan 1, 2025 Maintainer

yggdrasil75 Jan 1, 2025 Author

yggdrasil75
Jan 1, 2025

fpgaminer
Jan 1, 2025
Maintainer

yggdrasil75
Jan 1, 2025
Author

fpgaminer
Jan 1, 2025
Maintainer

yggdrasil75
Jan 1, 2025
Author