AI Descriptions with face recognition

Started by javiavid, March 19, 2025, 01:57:50 PM

Previous topic - Next topic

javiavid

When you generate a description with AI it is generically "two people looking at the sea". 
Would it be possible to use the previous face recognition tags to make the description more concrete? 
For example: Sam and Sara looking at the sea

Jingo

You can try to add in the specific people tags into the prompt to see if the AI will use them.  I do this for keywords by passing in the hierarchical keywords and assisting the AI.  Say, I have a photo where I have identified and added the bird name, I use these tags to hopefully provide the AI with some additional info specific to the species to get a more detailed description.  Not sure it always works.. but it is something to try.

Mario

#2
There are example for exactly this in the Prompting for AutoTagger help topic, did you read it yet?

Using Persons

{File.Persons.Label.Confirmed|hasvalue:These persons are shown in this image: {File.Persons.Label.Confirmed}.}
Describe this image in the style of a magazine caption. Use factual language.

This prompt uses the labels of the confirmed persons in the image to provide context to the AI. This often works very well, depending on the model and the rest of the prompt.

Using the .Confirmed variant makes AutoTagger ignore unconfirmed persons, and the hasvalue ensures that the entire sentence is omitted when there are no confirmed persons in the image.
Is IMatch not powerful? :D


javiavid

:o
Incredible! You have done an incredible job with the AI integration. With Gemma 3 it works perfectly.
Thanks

javiavid

Is it possible to use only the first word of the person's name?
For example: Sam Smith, use only Sam

Mario

This is most likely possible with a bit of standard IMatch variable magic.

{File.Persons.Label.Confirmed|splitlist: ,first}

"Albert Einstein" => "Albert"

The function splits the name at the blank (space) and returns the first element.
As a small reward for my tip, please consider telling others about IMatch - in your social media, photography forums etc.

javiavid

Thank you.

I have been using imatch for years and recommend it to everyone.
I think the big competition for the "normal" people is google photos, people are looking for something simpler. 

I love the possibilities of imatch.

Mario

Quote from: javiavid on March 19, 2025, 07:32:51 PMI think the big competition for the "normal" people is google photos, people are looking for something simpler.
People who are satisfied with Google Photos or Apple Photos or Windows Photo or similar software are not the target audience for DAM software. Many get by with whatever Windows Explorer does. Which is fine. Whatever works for you is good.

IMatch is a niche product (I'd wish the niche would be bigger) for demanding users.
Like somebody who wants to combine data gathered by face recognition, augmented by a powerful "Person" management system, to provide context to an AI that automatically generates descriptions and keywords...


Stenis

#8
Quote from: Mario on March 19, 2025, 02:35:51 PMThere are example for exactly this in the Prompting for AutoTagger help topic, did you read it yet?

Using Persons

{File.Persons.Label.Confirmed|hasvalue:These persons are shown in this image: {File.Persons.Label.Confirmed}.}
Describe this image in the style of a magazine caption. Use factual language.

This prompt uses the labels of the confirmed persons in the image to provide context to the AI. This often works very well, depending on the model and the rest of the prompt.

Using the .Confirmed variant makes AutoTagger ignore unconfirmed persons, and the hasvalue ensures that the entire sentence is omitted when there are no confirmed persons in the image.
Is IMatch not powerful? :D



Just what I needed now!

You know a support as the one you provide here is sooo valuable. What is the value of a lot of fantastic features if you don't even know they exists or how to use them properly.

There are often big problems in many well established softwares that have been around for decades with "uncontrolled structural growth" but I don't see all that much of that in iMatch but a lot more in DXO Photolab for example.

... and with AI-prompting we have got to a whole new level of complexibility for us users but it is so extremely valuable to be able to master the prompting because it has a big potential to both increase productivity and also the info quality.

May I ask you if the personal data is a proprietary solution not part of the XMP? If it isn't where do you store that data in XMP/IPTC? Can I use it as a variable populating other XMP-elements if I want - what is the syntax?

This is just one example here and just the beginning where AI prompts in Autotagger will be able to interact with older subsystems of iMatch. I really think this is exiting and I am very eager to lern more about this and use it in my future work with iMatch and my RAW-converters DXO Photolab and Capture One.

I must say I'm very surprised to see how well Autotagger works despite it is a very new tool. It is very versatile and surprisingly easy to use. I did not really expect that even if you started already in version 2023, because what I saw then didn't really convince me. For me 2023 was a Beta and that makes version 2025 the real version 1.x. It is very impressive for beeing a version 1.x.

Your way of working with your support and dialog with your users seems very labor intense and a day has just 24 hours but I also think it seems very effective as long as you have the strength and motivation to do it. I don't know how you manage but you seem to get an awful lot done compared to for example DXO with Photolab despite DXO is a much bigger company. Their problem is their almost total lack of dialog with their users. They introduced their PictureLibrary DAM-subsystem three years ago but not very much has happened since then. Sometimes less is so obviously more.

Well, now I have no more time before I have tried your example above.

Mario

#9
QuoteMay I ask you if the personal data is a proprietary solution not part of the XMP?

XMP has support for face regions, basically a rectangle or other shape and an associated "tag".
This is what IMatch can import (phones often record that info) and transform into face annotations. IMatch also persists face annotations as XMP regions to make the data you enter in IMatch accessible for other applications.

Then there is the IPTC PersonInImage tag, which is supposed to contain the names of persons shown in images. It's optional and not really that widely supported (metadata is hard). IMatch fully supports this tag and fills it with the labels of confirmed face annotations. IMatch can also intelligently import data from this tag and use it to aid face recognition (The PersonInImage Tag).


Everything on top of that is IMatch-only. The people concept. Families and Groups. Handling person age, day of birth and death to support face recognition. Many applications have some sort of face recognition, but that's just one step for IMatch.

QuoteIf it isn't where do you store that data in XMP/IPTC? Can I use it as a variable populating other XMP-elements if I want - what is the syntax?
IMatch stores persons, families, groups and their relations as part of the IMatch Graph. Similar to events.
You can access person data via Persons. Variables and of course use all of the IMatch variable magic to transform them, process them etc.

The other day a user asked if he can provide only the first name of persons to the AI, to get descriptions like "Pam and Peter riding in a car" instead of "Pam Miller-Mabridge and Peter Foundham-Worthington riding in a car". Which was doable easily with one of the variable functions IMatch provides:

{File.Persons.Label.Confirmed|splitlist: ,first}

Via variables you can access people data everywhere, including Metadata Templates and AutoFill. This allows you to use, transfer, persist people-related metadata wherever you need it. Same for events or all the other data IMatch maintains.

AutoTagger allows you to store AI-generated data in standard tags (persisted in the image) or AI tags, which stay in the database. You can also mix and match when needed.


QuoteI must say I'm very surprised to see how well Autotagger works despite it is a very new tool. It is very versatile and surprisingly easy to use. I did not really expect that even if you started already in version 2023,

AutoTagger in 2023 worked quite well with Google/Clarifai/immaga/Azure backends. The built-in AI was based on Google's mobile AI and could of course not compete with the large cloud-based systems.

In the past two years LLM AIs were invented, and they are a game changer. Not for everything, mind. And they halluzinate and make stuff up and you can never really trust them. But for the task at hand, creating descriptions and keywords in IMatch, they are working extremely well. Especially the new Gemma 3 model IMatch already supports. Please read the corresponding blog post from today.

QuoteYour way of working with your support and dialog with your users seems very labor intense
It is. But I'm German and very effective by nature. I automate the sh*t of everything, letting computers do as much work as possible.


Quoteand a day has just 24 hours
Maybe at your end. Here we have 48 hour days, plus the nights


QuoteI don't know how you manage but you seem to get an awful lot done

I actually get even more done than the IMatch user base is aware of.


QuoteTheir problem is their almost total lack of dialog with their users.
I consider this very important. Users let me know what they need and I enhance IMatch.


QuoteThey introduced their PictureLibrary DAM-subsystem three years ago but not very much has happened since then. Sometimes less is so obviously more.
Doing DAM properly is really hard. And it gets much less coverage and interest of journalists, bloggers and influencers. From a marketing perspective, it's much more betterer to add a new shiny filter or effect or anything AI*.

If DXO wants to hire me or do some consulting, I'm all ears ;)
Or maybe offer an IMatch bundle for a very attractive price...?

The entire AutoTagger feature set in IMatch is really deep. Even the ability to write your own prompts is not that common in other software, but it makes AutoTagger so much more powerful. And of course the ability to use variables in prompts...
While the default prompts I provide in the help work well, they are just a starting point for playing with this. Have fun.

Let others know about IMatch!

Stenis

#10
Your reply, remarks and examples are much appreciated!

Thank you very much for your detailed explanations of how you handle the data. Very helpful.

I have tested these functions that you mentioned above and just this little tread has given me/us a lot that makes Autotagger so much more powerful and versatile. What a game changer this is and will be in my practical work Mario.

One thing I experienced, when trying to paste the example string you wrote above into the "Prompt"-form (F7) was that I could not. So, in order to get it to work I had to paste it into the prompt for the "Description"-element. Can you look at that and try to replicate that "error"?

What you write about the lack of interest for DAM-tech even among Photo-journalists is nothing but depressing. If I had been younger (I´m 75 and retired) I would have started company and initiated a real "crusade" to get companies to understand the extremely high potential DAM-systems have in general to really transform a company or an organisation. I just have to be a result of lack of knowledge.

I worked very close with the developers of Fotoware Corporate DAM for seven years between 2009 and 2016 and saw with my own eyes what a game changer it was for the City Museum of Stockholm when we together were with Fotoware were able to tie XMP-metadata to all sorts of files and not just XMP-compatible picture formats like JPEG, TIFF, DNG and RAW-files. The real boost in power came when metadata could be tied even to PDF-files and Office documents like you have managed to fix even in iMatch. I don´t think most people realize that they already have these features too in iMatch for 1700 SEK or around 170  U$. That is fantastic and nothing I expected to find there. Companies and organisations that are able to really harness and use the DAM-tech to the max for all types of digital assets gets a huge advantage compared to the ones that lives "from hand to mouth" lost in the mess of their old style data-file silos. A real DAM is just so much more effective.

iMatch has such a solid metadata foundation that its almost ideal to use also on the client side together with corporate DAM "on the field even". It is also ideal for people who wants to scale up from all-in-one PictureLibrary-solutions like the one in DXO Photolab. So it is an especially good thought to offer it as a bundle with DXO Photolab, because nothing else is integrating better with iMatch than DXO Photolab and that is because that software has especially interesting integration feature most other converters have not. Nothing is more effective I think (at least not Capture One or Lightroom) and the main reason is that DXO Photolab unlike both C1 and LR is working straight on the files in the file system without any import processes.

The second thing is that it remembers every time you make a selection in iMatch and tranfer them to Photolab for processing. That function is called "External Selections" and gives us the possibility very quickly open such selection later witout even opening iMatch. I love that feature myself. I guess users that are open for an upscale to iMatch from Photolab will get surprised when they really can see what they actually can do with a real DAM like iMatch and how much more powerful and effective it is compared to what they are used to with Photolab PictureLibrary.

I just don´t understand photographers that don´t use tools like iMatch that have the potential to make them so much more effective when they get harder and harder squeezed economically when the value of pictures gets depreciated year by year, not the least by AI.

..... using iMatch is just so much more fun than using DXO Photolab PictureLibrary and for that matter even PhotoMechanic Plus and it really is much more effective, because of the useful and intellient AI-support and even that is very important. Manual metadata maintenance has a tendency of being extremely time consuming and frankly very boring and most users don´t even find it worth the effort but with iMatch the treshold gets significantly lowered.




jch2103

Yes, indeed. I think the problem is that people have limited attention spans, don't know what they don't know and are unsure about new things. 

There's a fascinating book, "Thinking Fast and Slow" by the psychologist Daniel Kahneman. To quote Wikipedia

QuoteThe book's main thesis is a differentiation between two modes of [color=var(--color-progressive,#36c)]thought[/color]: "System 1" is fast, [color=var(--color-progressive,#36c)]instinctive[/color] and [color=var(--color-progressive,#36c)]emotional[/color]; "System 2" is slower, more [color=var(--color-progressive,#36c)]deliberative[/color], and more [color=var(--color-progressive,#36c)]logical[/color].
To start using a DAM requires some "System 2" thinking, which requires more energy from the brain, even though the results will be something that can be used with "System 1" thinking once the initial learning curve is overcome. I.e., the rewards are more than worth the effort, at least for some people's use cases. 
John

Stenis

Interesting, thanks for that input!

I think it also is a matter of if how much patience we have as individuals. When we start building a picture library it will take some time before it gets " momentum and" becomes really useful. Of that reason a lot ofvpeoplecwill never get there.

When I first tried in Lightroom many years ago I gave up since it then and still are so very ineffective. For me it doable first when I started to use PhotoMechanic when the PM Plus 6 variant with database was released.

Stenis

Mario, do you have an idea about the costs using Ollama and the Google Gemma3 model instead of OpenAI.

For me it would be higly interesting if it is better with the language expressions and is better on landmarks and identifying animals of all sorts than OpenAI seems to be by default.

.... but maybe OpenAI would be better if I was more experienced in prompting enginering than I am for the moment.

Isn't this very much about prompting no matter the model you use?

Mario


QuoteMario, do you have an idea about the costs using Ollama and the Google Gemma3 model instead of OpenAI.

 It's free for private use. Same for LM Studio and Ollama.


QuoteOne thing I experienced, when trying to paste the example string you wrote above into the "Prompt"-form (F7) was that I could not. So, in order to get it to work I had to paste it into the prompt for the "Description"-element. Can you look at that and try to replicate that "error"?
Prompts go into the prompts input fields for keywords, descriptions and traits.
The input field in the AutoTagger dialog is where you, optionally, provide text for the The Context Placeholder [[-c-]], which is a neat feature to quickly "extend" existing prompts with ad-hoc information.

Using AutoTagger
Prompting for AutoTagger

for details about prompts, where to put them, the context placeholder and prompting tips.
I cannot repeat all that information in posts.

Stenis

Yes I did that when I didn't succeed with the pasting in the input window.:-) ...but when testing I started with it as an ad hoc case, thats why.

When adding conditions and instructions at the static description prompt does sequence play any role - I mean if one condition is dependant on the result of another.




Mario

"static description prompt"?
Which kind of conditions?

Keep in mind that IMatch combines the description/keywords/landmarks/trait prompts into one AI query. You cannot e.g. reference the description in your keywords prompt via a variable - because it does not yet exist.
If you need to do such things, create individual presets and run them in the sequence you need, e.g. first run AutoTagger for descriptions, then run AutoTagger with a preset for keywords. Or vice-versa.

Stenis

Quote from: javiavid on March 19, 2025, 06:40:14 PM:o
Incredible! You have done an incredible job with the AI integration. With Gemma 3 it works perfectly.
Thanks


Agree, that was really handy.
It was also very easy to transfer what I had in my prompts for Open AI even to Gemma 3