Observations about AI Landmarks/Variables

Started by jch2103, March 27, 2025, 12:10:02 AM

Previous topic - Next topic

jch2103

The Help notes that including variables (e.g., Description, location tags, etc.) can return the accuracy of the AI responses. (I'm at this point mostly looking to develop more descriptive descriptions, keywords, landmarks, scientific names and a determination of whether the image is B&W or not (using custom traits AI.ScientificName [Return the scientific name of the object shown in this image]and AI.BlackAndWhite [If this image is monochrome, respond with 'black and white' else return nothing]). 

Current working prompt (mostly for LM Studio but also exploration of OpenAI):
{File.MD.description}.
Describe this image in the style of a news caption. Use factual language. This image was taken in {File.MD.Composite\MWG-Location\Location\0}, {File.MD.city}, {File.MD.state}, {File.MD.country}.
Return five to ten keywords for this image.
If this image is monochrome, respond with 'black and white' else return nothing.

Observations: Including the description variable helps accuracy considerably (of course!), especially when there are similar but different characteristics in the image (e.g., two somewhat similar looking buildings). Location data variables (country, state, city and location) also help AI accuracy (e.g. to distinguish between the similar looking buildings. Sometimes the AI does have glitches or come up w/ incorrect responses, so something to check and not blindly rely on. Nevertheless, overall accuracy, including landmark recognition, is pretty good. 

I need to continue going through the extensive AI help to figure out how best to apply these tools. 

Question: Do any of the AI models available in IMatch take advantage of GPS coordinates? That might help with using AI for images that lack other location data or descriptions.

John

Mario

Quote from: jch2103 on March 27, 2025, 12:10:02 AMQuestion: Do any of the AI models available in IMatch take advantage of GPS coordinates? That might help with using AI for images that lack other location data or descriptions.
Have you tried it? Like giving the AI context information like "This image was taken at the GPS coordinates latitude nn, longitude nn, consider this when you analyze the image"?

The result of this will of course depend on the model you use. It's more likely to work with the huge cloud-based AIs.

I've made some experiments with Gemini recently, prompting for GPS coordinates. But the results I got were usually several hundred meters off, even for well-known places in Paris and London.

jch2103

I had tried this
{File.MD.description}.
Describe this image in the style of a news caption. Use factual language. This image was taken in {File.MD.Composite\MWG-Location\Location\0}, {File.MD.city}, {File.MD.state}, {File.MD.country}.
This image has the coordinates {File.MD.Composite\GPS-GPSLatitude\GPSLatitude\0} and {File.MD.Composite\GPS-GPSLongitude\GPSLongitude\0}.
Return five to ten keywords for this image.
If this image is monochrome, respond with 'black and white' else return nothing.
for an image with no metadata except dates and GPS coordinates. The AI figured out that the image was in Arizona, but was not more specific. When I did a reverse geocode on the image with the same prompt, it returned information down to the street (but not address). Same for OpenAI. 

I substituted your GPS prompt (without reverse geocoding)
{File.MD.description}.
Describe this image in the style of a news caption. Use factual language. This image was taken in {File.MD.Composite\MWG-Location\Location\0}, {File.MD.city}, {File.MD.state}, {File.MD.country}.
This image was taken at the GPS coordinates {File.MD.Composite\GPS-GPSLatitude\GPSLatitude\0} and {File.MD.Composite\GPS-GPSLongitude\GPSLongitude\0}. Consider this when you analyze the image.
Return five to ten keywords for this image.
If this image is monochrome, respond with 'black and white' else return nothing.

This return was much more specific (subject was a roadside attraction with dinosaurs sculptures in Yermo, California), with some expected variations in different runs. My takeaway: Prompt phrasing does seem to make a difference (as expected). 
John

Mario

Tip: Check your prompt on VarToy to see the result. I'm not sure what MWG\Location actually contains.
You also seem to mix multiple prompts into one?
Because the is a prompt suitable for a description.

But then comes a part I would expect as the keyword prompt.

And the last part "monochrome" is a prompt I would use for a trait.

Are these three separate prompts in your setup?

jch2103

Thanks. 
VarToy return is OK.

Yes, I was duplicating input to the AI. I've deleted the unnecessary extra text in the prompt re keywords (it was already present in the Preferences AutoTagger setting for Keywords).
This image was taken in {File.MD.Composite\MWG-Location\Location\0}, {File.MD.city}, {File.MD.state}, {File.MD.country}.
This image was taken at the GPS coordinates {File.MD.Composite\GPS-GPSLatitude\GPSLatitude\0} and {File.MD.Composite\GPS-GPSLongitude\GPSLongitude\0}. Consider this when you analyze the image.
The second line is to account for images w/o Country/State/City/Location. I don't know if the duplicate location data helps or hinders the AI. 

I have two custom traits: 
AI.ScientificName (Return the scientific name of the object shown in this image.) [Interestingly, this sometimes returns 'homo sapiens', for people.)
AI.BlackAndWhite (If this image is monochrome, respond with 'black and white' else return nothing.) [I've deleted the duplicate language in the prompt.]

John

Mario

Including the location usually help the AI to produce better descriptions (or keywords). Depends on the model, of course.
Not sure about the GPS coordinates. Depends on how the model was trained and if it has a concept of GPS.

You can do a simple a/b test if you want:
Set the seed to a fixed number (say: 123) and the creativity to the lowest setting.
Select 10 representative images and make two copies of each image (30 images).
Run the prompt as it is on the first 10.
Remove the GPS part and run the prompt on the second 10.
Re-insert the GPS part but remove the location part and run it on the last 10 images.
Compare the results.
Reset the set to 0 afterwards.

This can be very insightful. As can be running the same prompt with different seeds and/or different creativity settings.

Note: Google Gemini 3, available in the next IMatch release, knows a lot about places and locations (Google has a treasure of data to train on, from Google Maps and Street View).

jch2103

John

Mario

Quote from: jch2103 on March 30, 2025, 06:43:58 PMThanks. Gemini 3 should be interesting. I also saw this yesterday: https://arstechnica.com/ai/2025/03/google-says-the-new-gemini-2-5-pro-model-is-its-smartest-ai-yet/

Interesting times.
I've commented on that already (search for it). The new model is a reasoning model, more aimed at math, reasoning, coding. For the intended purpose in IMatch (creating descriptions and keywords for images) this model should not have a big impact. It's not available publicly and still in beta. When it becomes available and I see a benefit, I will make it available in AutoTagger.

I think people will be very surprised at how Gemini 2.0 (Flash) already is with AutoTagger.