AutoTagger for image sharpness and framing evaluation

Started by smeyer02, February 19, 2025, 10:51:43 PM

Previous topic - Next topic

smeyer02

Has anyone had success in using AutoTagger to evaluate the sharpness of their images and/or the framing (i.e., is the subject partially out-of-frame)?

I've had mixed results with both; curious to see if others have prompts that work better.

Relevant (I think) settings:
Model:  GPT 4o
Extra Large Image size

Sharpness prompt:
Evaluate the sharpness of the image based on edge contrast, which detects strong transitions and definition of edges. Higher contrast between edges indicates sharpness, while weaker transitions suggest blur. Assign a sharpness score from 1 (blurry) to 5 (razor sharp).

Framing prompt:
Evaluate the framing and composition of the image with an emphasis on subject completeness. If any parts of the subject (e.g., wings, propellers, heads, feet) are out of the frame, deduct points based on severity. The maximum score for a cropped subject is 3. 

Assign a numeric score ONLY (1, 2, 3, 4, or 5) without any extra text, explanations, or words. The response must be a single number only. Assign a numeric score based on the following criteria:

• 1 (Poor Framing) – The subject is severely cropped, making it incomplete.
• 2 (Weak Framing) – Some important elements (e.g., wings, ears, limbs) are cropped, reducing clarity.
• 3 (Adequate Framing) – Subject is mostly well-framed but slightly cut off or unbalanced.
• 4 (Good Framing) – Fully in frame but with minor compositional issues.
• 5 (Excellent Framing) – Subject is fully in frame with no unintentional cut-offs.


It's probably obvious that I enlisted the help of chatGPT to create the prompts.   ;D

Thanks!
Stephen

Mario

The OpenAI models are generic models. I doubt their models are up to this rather specific task. If the model was not trained for something, it might be right, sometimes. But not reliably.


Which creativity settings did you try in your test? For example, same seed (<> 0) and different creativity settings (starting low) for the same set of test images. 

If you want an AI to reliably e.g. identify motorcycle brands and models, fabric types, plants, or image sharpness, a model must be trained specifically for this purpose. World models like ChatGP are probably not going to cut it.
For example, by feeding it 200,000 images in various states of sharpness (e.g. by synthesizing differently blurred images for each original, then blurring individual objects or sections for each original) to allow the model to learn what sharpness means and how to identify it. Same for the framing task.

There are specialized models out there, e.g. for plant and mushroom detection, bird watching and more. These models are either proprietary or not in a format Ollama can use.

I hope that the availability of capable models for specific tasks, able to run on software like Ollama, will become available in the near future. And that it becomes easier and cheaper to train or fine-tune existing models for specific purposes.

I'm not sure, but I think the Excire software has a proprietary model that is supposed to rank images by various criteria. Maybe try out their evaluation version (be careful when it comes to metadata manipulation, test carefully).

smeyer02

Hi, Mario -

Thank you for the insights.  What you've described is essentially the results I have seen through iterating on the same images with different prompts and settings.  It's "close but no cigar"  :) Ideally the AutoTagger would give results that I could be confident enough that I wouldn't need to review, but that's not the case currently.

When you ask chatGPT how the gpt-4o determines focus, it gives a very interesting answer:
GPT-4o does have multimodal capabilities, meaning it can "see" an image and describe its sharpness in a qualitative way. However, its sharpness evaluation is not a pixel-by-pixel mathematical computation like OpenCV or MATLAB would do. Instead, its assessment is based on:

Pattern Recognition in Image Data
GPT-4o can identify visually sharp vs. blurry images based on trained examples.
It infers edge sharpness, motion blur, and focus without calculating variance.

Heuristic-Based Evaluation
GPT-4o likely estimates sharpness using qualitative features (e.g., contrast in high-frequency areas).
It simulates an understanding of sharpness but doesn't perform true Laplacian variance calculations.

Not Using True Mathematical Sharpness Metrics
GPT-4o does not compute Laplacian variance, Sobel gradients, or Fourier transforms natively.
Instead, it mimics these assessments by recognizing common patterns in sharp vs. blurry images.


This confirms what you said - it's all in how the model was trained.

I did try to vary the "creativity" setting, but it didn't seem to impact the results significantly.  I'm curious how this is implemented in the "calls" to the model - is it added as text to the prompt?

My guess is that image size also is impacting the results of the focus evaluation - i.e., a smaller version of a slightly out-of-focus image will appear sharper due to fewer pixels.  Maybe even the "Extra Large" size has enough pixel reduction to impact the perceived focus.

I will keep trying and see when the models become sophisticated enough to handle this task.

Thanks,
Stephen

Mario

Quote from: smeyer02 on February 20, 2025, 04:03:45 PMI did try to vary the "creativity" setting, but it didn't seem to impact the results significantly.  I'm curious how this is implemented in the "calls" to the model - is it added as text to the prompt?
Seed and creativity (aka temperature) are parameters IMatch sends with the prompt to the API endpoint.
How the AI interprets this depends on the model, version and undocumented behavior.

Maybe try Mistral next?

sinus

Quote from: smeyer02 on February 20, 2025, 04:03:45 PM... It's "close but no cigar"  :) Ideally the AutoTagger would give results that I could be confident enough that I wouldn't need to review, but that's not the case currently....
Stephen

Yes, I think, doing this without reviewing, we must still wait quite a bit. 
AI is very good, but not that good ... yet.

Execpt of course, you do this not for business, just for you and fun, then this is another thing.  ;)

Best wishes from Switzerland! :-)
Markus