Support for Google Gemini in IMatch

Started by Mario, March 26, 2025, 12:01:20 PM

Previous topic - Next topic

Mario

I'm happy to announce that I have completed integration and initial tests for Google Gemini (for the current "Gemini 2.0 Flash" (cheaper, more than good enough) and "Gemini 2.0") models with very good results.

I've waited until Google's OpenAI-compatible interface implementation was stable enough to be used in IMatch. This simplifies things for me, since I don't need the extra functionality of Google's own API for IMatch and using OpenAI-compatible interfaces and processing for several AIs simplifies my code and testing. More robust, too.

Google Gemini will work in AutoTagger like all of the other supported AIs.
You get yourself an API key for Gemini, select Gemini as the AI and the model of your choice and of you go.

What makes Gemini especially versatile is the deep knowledge of places, tourist spots and landmarks.
Google here has access to all the data they have collected for Google Maps, and all the information and imagery people have shared freely with Google via Maps, Google Lens and other data feeds Google maintains.
This is an immensely precious data trove only Google has access too.

The boat and market hall were produced by a fairly generic prompt:

[[-c-]] Describe this image in the style of a news caption. Use factual language.
If you can identify known places or buildings, mention them in the description.

As you see, Gemini knew where the image was taken (quite astonishing, giving that only a small section of the market hall is visible. It also identified the name of the ship. Probably asking for the make/model of the ship in the prompt would provide additional info.

Having dedicated prompts for, say, vehicles, buildings and bids makes sense, because you can prompt for specific details.

For image of the bird I've used a more "bird-specific" prompt:

[[-c-]] Describe this image in the style of a caption in a bird watcher magazine.
Include a detailed description of the bird, the classification and Latin name.


The description is longer, but clipped by my FW layout. Note that I have no idea if this is really the right bird, I have no clue about birds or taxonomy.
Your prompts will be much better, I'm sure. Google has prompting tips on their Gemini site. I have not looked into this yet, I was concentrating on getting this to work in AutoTagger. Which I did.

Image4.jpg

Jingo

This is great... as mentioned, I tried Gemini on 5 birds yesterday and it nailed all 5 of them (a couple were more difficult too like a seagull and hooded merganser!).  Looking forward to seeing how it rolls on my bird photo database!

Mario

Quote from: Jingo on March 26, 2025, 12:19:09 PMThis is great... as mentioned, I tried Gemini on 5 birds yesterday and it nailed all 5 of them (a couple were more difficult too like a seagull and hooded merganser!).  Looking forward to seeing how it rolls on my bird photo database!

Hm... maybe I should make this feature only available in an expensive, subscription-based

"Pro Cloud AI Blockchain with Extra Bugs and Only One Meaningful Update Per Year"

version of IMatch... ;D

Adobe has tripled their annual revenue after forcing all users into subscription-based products. So there is a good precedent. Why sell software only once, when you can sell it every month?

jch2103

Probably not a surprise, but Google announces today the new Gemini 2.5 Pro Experimental: https://arstechnica.com/ai/2025/03/google-says-the-new-gemini-2-5-pro-model-is-its-smartest-ai-yet/

Lots going on in AI, certainly too much for me to keep track of. Kudos to Mario for all the work here!
John

Mario

The 2.5 is a "reasoning model", much like DeepSeek.
It takes a while to "think" before it answers. If you have installed LM Studio, you can download DeepSeek and see how this works.  Basically, you can watch the AI discuss with itself while it tries to figure out an answer.

I made Gemini 2.0 Flash Lite the default model for AutoTagger, since it has excellent visual capabilities, which is what we want for IMatch. It's also very affordable with a good daily free allowance.
I also support the Gemini 2.0 Flash model, which is more expensive and better. I could not find a difference between the two models. Like always, if you run the same prompt twice, you get slightly different results. But basically the same for both models.

Newer models may excel at certain things or in certain benchmarks - which are irrelevant for the intended use in AutoTagger. We only want the model to fabricate good descriptions and keywords for us. And maybe a couple of traits to automatically organize our image collections via a couple of smartly defined data-driven categories.

If a user needs the AI to help with advanced math, programming tasks, writing 100 blog posts per day about cooking or repair, different models may be more suitable.


QuoteLots going on in AI,

You have no idea!

I can show you several gray hairs I've got from all the AI stuff since I made the first steps towards AutoTagger 2.0 back in July last year. Innovation cycles in AI are measured in weeks, not months or years.

QuoteKudos to Mario for all the work here!
You're welcome. Just keep on letting others know about IMatch and save a few bucks each months for the 12-18 month major upgrade cycles :D