The Gemma 3 Model - generate hierarchical keywords

Started by Mario, March 19, 2025, 03:23:56 PM

Previous topic - Next topic

Mario

I've had some good success by a quite specific keywords prompt:

Return up to 20 keywords describing this image.
Create the following keywords:
1. identify the hair color and create a keyword in the following format: "hair color|[hair color description]"
2. identify the eye color and create a keyword in the following format: "eye color|[eye color description]"
3. identify the hair style and create a keyword in the following format: "hair style|[hair style description]"
4. Estimate the age of the person and create a keyword in the following format: "age group[age description]"
5. identify the apparel and create a keyword in the following format: "apparel|[apparel description]"
6. identify the facial expression and create a keyword in the following format: "facial expression|[facial expression]"

Use this prompt as an example for your own Gemma 3 prompt experiments.
While not being always 100% correct, this prompt and letting AutoTagger run for a day categorized one of my sample databases by these person properties quite nicely. And saved me days, if not even a week, of work.

Tip: You can ask Gemma 3 for how to best write a prompt to achieve a specific result. Discuss with it, make your requirements ane expectations clear and then try out the resulting prompts. Often leads to good results.



Jingo

#1
Thx for sharing... I'll have to try this on some people photos and see how it works.

I just tried using Gemma 3 on my bird photos... it was right only about 20% of the time unfortunately and based on the results, I felt it was guessing most of the time (ie: is misidentified a Robin as a Cardinal.. then, misidentified it as a tufted titmouse!) ... the same photos were successfully identified about 70% of the time using OpenAI... (Robin right at the start!) so, it appears some models are better suited for certain types of photography.  I'll keep playing though because using a local AI is appealing!


Mario

Which version do you use? 4B or 12B or 27B?

It's all a matter of model size.
Quantizing a model to make it smaller and able to run on consumer-grade graphic cards involves losses.

The publicly available full-precision 27B model of Gemma 3 needs 108 GB VRAM. The full precision version of Gemma 12B still needs 48GB VRAM, which is not affordable for normal users. The 12B model offered by LM Studio needs ~12 GB VRAM and has been downsized from 32bit to 8bit.

The original Gemma 3 / Gemini model has probably several hundred billion parameters, but requires a data center to run. Same for OpenAI.These models know more because they are bigger.

For highly specialized tasks like identifying birds or plants or insects you either need a specifically trained model (to run it locally) or a very large model (cloud). And a very large specialized model will be even better, and may reach 80% or even 90% recognition rate.

What makes Gemma 3 so special is that it handles many different languages, has a 128K context window (aka a large memory to handle longer chats), generates good enough descriptions and keywords and still can run on as little as 5GB of VRAM. The 12B model is better than the 4B model, of course.

I'm sure we'll see publicly available specialized models for many tasks and usage scenarios in the future.

Jingo

I am using the 12B model... not putting down of course - I'm sure it is great and providing it with the bird name via a tag produced some very good descriptive results.  I look forward to the day when someone like Merlin releases their data as an API... it is the best I've used many times over for bird identification!

Stenis

#4
I have installed Ollama and Gemma 3 4b and I have a 3060 Ti with 8 GB. I can say so much that I do prefer Gemma 3 before OpenAI when it comes to Landmarks and I also think it writes a little bit better seen to the language - but it is just a feeling.

BUT, Gemma 3 4b is not at all impressive when it comes to identifying species. It is absolutely hopeless on birds and so is OpenAI mini. If I compare them both with for example Google Lens in my phone, Google Lens is just sooo much better.

No one of Gemma 3 or OpenAI fixes anything else of animals than big ones like giraffes, lions, elephants, hyenas, leopards but they both have hard to distinguish between wildebeasts and buffaloes. They mixed elands with hartebeasts, could not identify hyraxes (not even close). Every deer like animal being it a Thopsons gazelle, an Impala or a Di Dik or a Waterbuck all were antelopes or soo.

Google Lens fixes all sorts of reptiles and birds and is very useful.

Well this is fine for me as long as it is just a few words here and there to fix.

When it comes to identifying people and use their names in the texts I thought first that I would make sure to confirm every single picture but I found a workaround. Say I have 10 pictures I just fix one with confirmed name and then I begin to select the nine I did not fix/confirm and end with the one I actually fixed and press F7 and check run once o get the same text on all of them. (of course that is an exception often sine the same text often doesn´t work).

So I wonder have to speed up a process like confirming names??

Mario

QuoteGoogle Lens fixes all sorts of reptiles and birds and is very useful.
I doubt that Google will make this model available to the public. They gather a massive amounts of data from the people using Lens and that is what makes them money.

Maybe give Gemini (https://gemini.google.com/) a test (online) to see if it as good at detecting bird species as Lens?
I could add support for Gemini to a future AutoTagger version.
Let us know what you find out.


QuoteSo I wonder have to speed up a process like confirming names?? 

The Face Manager


Stenis

I saw in the release notes that you had developed Face Manager. I will check that.

Jingo

I've tried Gemini and results were MUCH better than Ollama and OpenAI...  it for 5 for 5 on the bird images I sent to it and provided some really good feedback. Even recognized a hard to tell seagull and provided the correct response by recognizing the area I am in (assuming the photo was taken from there).

If you were to add support for Gemini - I would certainly play with it and ultimately use it full time if the results are THIS impressive!

Mario

Quote from: Stenis on March 25, 2025, 12:05:41 PMI saw in the release notes that you had developed Face Manager. I will check that.
Face Manager is around for many years.

Lincoln

Two hands up for Gemini please >:( . Having great luck with identifying not only birds but styles/types of sailing craft, Truck configurations, commercial ships and aircraft. Not always correct but it did show me a new breed of sheep that I didn't even know about. The breed was only created in 2012 but it correctly identified them. Very accurate on certain architecture/landmarks in Sydney Australia but complete miss other older buildings in the same area but it seems to be getting more accurate everyday.
OpenAI is great but Gemini goes into more detail especially with the transport industry without having to prompt it as much.

Stenis

Quote from: Mario on March 25, 2025, 12:52:19 PM
Quote from: Stenis on March 25, 2025, 12:05:41 PMI saw in the release notes that you had developed Face Manager. I will check that.
Face Manager is around for many years.

I just read the release notes for the last version.

Stenis

#11
Quote from: Mario on March 25, 2025, 08:19:17 AM
QuoteGoogle Lens fixes all sorts of reptiles and birds and is very useful.
I doubt that Google will make this model available to the public. They gather a massive amounts of data from the people using Lens and that is what makes them money.

Maybe give Gemini (https://gemini.google.com/) a test (online) to see if it as good at detecting bird species as Lens?
I could add support for Gemini to a future AutoTagger version.
Let us know what you find out.


QuoteSo I wonder have to speed up a process like confirming names??

The Face Manager



I have read and explored even Gemini Advanced and that is just for text.

I think this is what ought to be tested: Cloud Vison AI API

https://cloud.google.com/vision/docs?_gl=1*11d5xiw*_up*MQ..&gclid=6220ac36f433138cd87ecbd44ca2b0fb&gclsrc=3p.ds

https://cloud.google.com/vision/docs/features-list?_gl=1*1bk1nba*_up*MQ..&gclid=6220ac36f433138cd87ecbd44ca2b0fb&gclsrc=3p.ds

What do you think.
Would it be difficult to integrate this.
If it is as powerful as Google Lens it will be a goog contribution
... and nobody can compete with Google when it comes to GIS/Geo-data and Landmarks

What is the "Googe" option in Autotagger?
It seems like there are quite a few Google related AI-services out there and I have hard to understand which to use and what support for those are in Autotagger and iMatch.
Is the one in iMatch the API from Vision AI or what?

Can you support this??

Maybe even this "Document text detection (dense text / handwriting)" could be of interest for you in the future combined with iMtach possibility to tie metadata to PDF and Office docs. Textextracting that makes PDF.files indexable?


Mario

IMatch AutoTagger already has the "classic" Google Cloud Vision.
That's the old implementation, before Gemini.
Get an API key and try it out. See Google Cloud Vision in the IMatch help for more.

Mario

Quote from: Lincoln on March 25, 2025, 01:58:05 PMTwo hands up for Gemini please >:( . Having great luck with identifying not only birds but styles/types of sailing craft, Truck configurations, commercial ships and aircraft. Not always correct but it did show me a new breed of sheep that I didn't even know about. The breed was only created in 2012 but it correctly identified them. Very accurate on certain architecture/landmarks in Sydney Australia but complete miss other older buildings in the same area but it seems to be getting more accurate everyday.
OpenAI is great but Gemini goes into more detail especially with the transport industry without having to prompt it as much.
Can you give me an example image, the prompt you have used with Gemini and the results you've got?
The image can me small, 400px x 400px.

Stenis

Quote from: Lincoln on March 25, 2025, 01:58:05 PMTwo hands up for Gemini please >:( . Having great luck with identifying not only birds but styles/types of sailing craft, Truck configurations, commercial ships and aircraft. Not always correct but it did show me a new breed of sheep that I didn't even know about. The breed was only created in 2012 but it correctly identified them. Very accurate on certain architecture/landmarks in Sydney Australia but complete miss other older buildings in the same area but it seems to be getting more accurate everyday.
OpenAI is great but Gemini goes into more detail especially with the transport industry without having to prompt it as much.

What interface are you using then - iMatch??

What about the costs.
OpenAI is absolutely great and so is Gemma with Ollama but Google Lens is better I think.
I looked into Gemini Advanced just for a hour ago but I did not see any possibilities there to analys pictures.
I also asked the service itself that denied such capabilities.

As I wrote to Mario there is a Google Cloud Vision API also which seems pretty expensive.
Gemini Advanced also is expensive and cost 255 SEK which is 25 U$ a month.
Very muck more than my other two working alternatives.

Lincoln

Sorry I should have stated I was using the Gemini/Google.com app as my computer isn't powerful enough to use local LLM's. For Gemini I just used the prompt "Identify this breed of sheep" or "Identify this car" etc. Gives useful information and reasoning which I then use for Headline and Description.

For OpenAI with Imatch I use
[[-c-]] Describe this image in a style to make it easily searchable as a stock photo. Please use Australian English spelling and slang.
Avoid describing anything not directly observable from the image.
Use very strictly the location {File.MD.city},{File.MD.location},{File.MD.state},{File.MD.country}.
If the place, building, architectural style or landmark is known to you, make sure to include it's name in the description. If image contains an animal, bird, reptile name the species and breed in the description.