[Official] What's Berewing for IMatch 2024 / 2025. You will like this!

Started by Mario, May 27, 2024, 11:25:58 AM

Previous topic - Next topic

Mario

I always write 2024/2025 because I currently don't know when I will consider the next major release finished, stable and ready to release :)

What's Brewing...

One of the major things I work on for the next major IMatch release is to enable you to do more with less work. And a big part of that will be to improve AutoTagger AI in IMatch. Automatic keywords, automatic descriptions and other features which automatically organize your images and other files. All running locally on your PC, without cloud.

AI technology moves very fast, and there are several cool free open source projects which allow developers to integrate AI into their apps and software. Microsoft is also adding more and more AI to Windows itself, and, as far as I can tell, they do that in a way that Windows applications like IMatch can utilize. We'll see.

Sadly, I cannot train my own AI models for IMatch, because - money!
Training an AI model from scratch costs between 1 and 20 million US$, considering computing resources alone. Yikes!

But what I can do, is to interface with publicly accessible open source projects and integrate them seamlessly into IMatch. This allows you to use the latest crop of open source AI models (or AI tools integrated in future Windows versions) in IMatch.

Using features you already know, like the AutoTagger, the Keywords and Metadata Panel, and probably data-driven categories based on information the AI can extract from your images automatically. Again, running locally on your PC, no cloud. Less typing, better search and automatic organization for all your files, for free.

See What the New AI in IMatch Can Do Right Now!

To give you an impression on what I'm working on, I've made a short video (2 minutes)

https://www.photools.com/bin/imatch-ai-2-001.mp4

This video shows how the "IMatch AI 2.0" automatically adds descriptions and keywords to images with a variety of motives.
Before you ask: Yes. There are options to influence the wording and phrasing used, text length, number of keywords generated per image etc. ;)

Help Wanted

I have an almost finished prototype app which does what you see in the video.
It is a test bed, designed to experiment and figure out which "prompts" (questions to ask the AI when I want it to describe an image) produces the best results.

It is a learning process for me. I know that different prompts work for different types of motives.
If you want the AI to describe images taken during your last vacation or city trip, you can tell it to. Or if you want the AI to describe photos of cats, dogs or birds. Or landscapes. Industrial images. Weddings. Food. Abstract motives.
For example, adding context like "These images were taken in Madrid, Spain" will improve the description and keywords considerably.

This is where I need your help. Playing with the new AI app in IMatch, trying different prompts and see which prompt produces the best results for your images.

I'm looking for 5 to 10 users who have a few hours a week to play with this and share results and provide feedback with the other participants and me. We'll do that in a closed group to not disturb the regular flow of this community.

Let me know what you think about the video.
And if you can spare a few hours to try this out on your PC and with your images.

mopperle

Mario, how would this test technically work, can you describe this a bit in detail? Separate installation, working with a test DB.... Maybe I am interested.

Mario

You can see the results of the AI for your files and prompt with the click of a button
This does nothing to your database, unless you actually click the button to write the keywords / descriptions into files.

To make things easier to see in @Keywords, the test app produces keywords under the common root keyword AI.
In the final version, all this will be integrated into AutoTagger and the normal thesaurus lookups, black lists etc. will work like they do with the current IMatch AI and the external services.

You can use test files of course. Just copy together a bunch of files you want to test into a folder and work with that.
This is how I work, with a couple of thousand files. Easy to remove all keywords and descriptions via the MD / KW Panel and start fresh.




mopperle

OK, I've understood. The complete idea and the video looks really promising, so I would be in for testing.

evanwieren

Hello... I recognize that I am a very new user. However, as a developer myself, and having followed your product for some time, I am interested in helping out. A few hours a week is something that I can do. I would be keen to learn how you are applying the AI and what tools you are ultimately developing on top of.

sinus

Mario,
I think you can judge my "technical" skills quite well. 
If you think I can help with that too, then I'm happy to help.
 
Unless the sky is falling, I should be able to spare some time each week - especially as it's an interesting topic.
Best wishes from Switzerland! :-)
Markus

Lincoln

Happy to help if I can. Things are certainly changing fast in this field so would be a good experience for me to help understand it a bit more. Is there the possibility to "train" different subjects? In my case Australian content. Most of the "keywording" software I have tried doesn't read the Australian content at all like the AutoTagger app only gives one keyword/tag for an image of the Sydney Opera House!

Mario

QuoteIs there the possibility to "train" different subjects?
Yes. You can retrain / extend existing models.

But that can be quite complex. For example, if you want to train the AI to recognize specific buildings or landmarks, you would have to prepare a large number of tagged images for each building, with regions outlining the building / landmark, tags etc. in the specific format your model and tool chain requires. And a lot of tooling (software) installed, Python, PyTorch, a very fast graphic card or even graphic cards etc. Not exactly doable for "normal" people like us.

I'm sure enhancing existing AI models, transfer learning etc. will become easier in the future. I keep my eye on this of course.
But the training data must be there, which always requires a lot manual labor.

Google and the other big companies with deep pockets do all that by paying thousands of low-wage workers using platforms like Amazon's Mechanical Turk and similar. They basically add regions and tags to objects in images all day - to produce training material.

In addition, Google and others let's users do a lot of work for free.
- photo upload to Google Maps => free training material for Google's AI.
- solving Google Capcha's ("click on all bicycles") => free training material for Google's AI.
- "tag your friends" on Facebook => free training material for Meta's AI.
...

Cloud vendors like Clarifai, Google, Amazon allow subscribers already to train their own models - at a substantial cost.
I'm sure smart people will make the same work, affordable and doable for "normal" users over time.

axel.hennig

Maybe I could also support, but:
  • My english is not the best one (not sure if this is relevant).
  • I have more or less no experience in "prompt" writing.

Mario

Quote from: axel.hennig on May 29, 2024, 12:48:09 PMMaybe I could also support, but:
  • My english is not the best one (not sure if this is relevant).
  • I have more or less no experience in "prompt" writing.

Prompts have (currently) to be written in English for the model I use for the purpose.
You can always, and I would consider this an interesting and useful test, write your prompt in your native language (there are many examples for how to write prompts on the Internet) and then e.g. use DeepL.com to translate it into English and see what description and keywords the IMatch AI produces for your files.

Descriptions and keywords are currently only generated in English. I could not find a machine-translation model that can be run on a local PC. For now. Thanks change fast and all that.

Feeding the output of the AI (description and keywords) into a cloud-based translation AI like DeepL.com would be definitely a big improvement for users who don't use English for their images.
At least for keywords, we need to translate them only once and IMatch could maintain a list of already translated keywords. This would reduce the cost. For descriptions, though, this would not work.

I would use the same approach as for the Map Panel / reverse geocoding. Users can enter their DeepL.com credentials (they have a free tier) and then use that to let the AI-generated translation translated by DeepL.

The same could work for prompts, I guess. We will have some basic prompts which work for a variety of motives and the user can extent that when needed with a "context" in English (maybe auto-translated from their native language).

mopperle

When will you start the test? I'm currently building a test DB with some thousand files with very mixed content.

Mario

I will start the test when everything is ready. This post is only two days old. Only a few users have seen it yet.

mopperle

Does the Ai model also work with video files, based on the thumbnail created by IMatch?

Mario

Like the current face recognition Auto-Tagger, the thumbnail is used for video files. The user controls the thumbnail and has thus the pick. You can select the thumbnail for video files (from the set or thumbnails IMatch has extracted) using <Shift+><Ctrl>+<PageUp>/<PageDown>.

mastodon

That is very promising. As a family user, a much better face recognition would be even better. Even if it would be a plugin for a fee.

Mario

Quotea much better face recognition would be even better.
The face recognition already integrated in IMatch is one of the best systems available and ranks very high on the standard benchmarks.

If you have problems with it, check your trained faces. Don't overtrain. 3-5 trained faces is usually more than sufficient.

The AI we're taking about here does not do face recognition at all. It's a model designed to detect objects in images to create descriptions and keywords.

mastodon

I know, that this is not about FR. Yes, IMatch quit good at FR, but many times it does not recognise a face as a face, mostly from aside (profil). So, it is good, I like it to be better. (This is just a note :)

Mario

Quotemostly from aside (profil).
This is a top challenge, even for police and military face recognition technology.
You'll have to use a highly specialized model trained to recognize faces from the profile, which basically removes 50% or more of the face landmarks AI's use to detect faces in images or video footage.
You may be able to do this for persons you know.
But for general AI face detection, very hard. And such a specialized model would probably produce tons of false positives for frontal faces.

Stenis

Quote from: Mario on May 28, 2024, 10:51:29 AM
QuoteIs there the possibility to "train" different subjects?
Yes. You can retrain / extend existing models.

But that can be quite complex. For example, if you want to train the AI to recognize specific buildings or landmarks, you would have to prepare a large number of tagged images for each building, with regions outlining the building / landmark, tags etc. in the specific format your model and tool chain requires. And a lot of tooling (software) installed, Python, PyTorch, a very fast graphic card or even graphic cards etc. Not exactly doable for "normal" people like us.

I'm sure enhancing existing AI models, transfer learning etc. will become easier in the future. I keep my eye on this of course.
But the training data must be there, which always requires a lot manual labor.

Google and the other big companies with deep pockets do all that by paying thousands of low-wage workers using platforms like Amazon's Mechanical Turk and similar. They basically add regions and tags to objects in images all day - to produce training material.

In addition, Google and others let's users do a lot of work for free.
- photo upload to Google Maps => free training material for Google's AI.
- solving Google Capcha's ("click on all bicycles") => free training material for Google's AI.
- "tag your friends" on Facebook => free training material for Meta's AI.
...


Google is smarter than that. If I use Google Lens in even the cheapest Samsung phone for motif recognition, I get a question afterwords to answer and if some people like myself do we will help Google Lens to improve. Since it is a win - win situation Google do not need to pay me and I donĀ“t have to pay Google. I have nothing really to say about that. The service is already fantastic. If it will be anything near what I already have in my phone it will definitely be a game changer!

Good luck with the new AI features! Shall be really interesting to test.

Do you know if there will be some extra cost using Google in iMatch?

Mario

I'm not quite sure what you are trying to tell us, sorry?
If you do the work on your smart phone, the AI-generated descriptions and keywords should be available to IMatch as soon as you add your images to your database. So you're "good", already. Unless the AI data is stored outside standard metadata.

QuoteDo you know if there will be some extra cost using Google in iMatch?

From the start, IMatch supports the same Google AI as before (in addition to Clarifai, imagga and Microsoft). This is for backwards-compatibility for existing users. The costs are the same as before (unless Google changes them).
The Google model may be as good, better, or worse than what you use on your Samsung Phone. I really don't know. But you can try it out easily enough, and Google offers a generous amount of free usage per month (for the "old" models, not for the new LLM models).

For the "modern" AI IMatch supports OpenAI and Mistral from the start.

I have tested the Google Gemini models and others and they were slower and produced worse results for many motive types in my AI test suite than the models offered by OpenAI and Mistral. At the time I did the tests (autumn), mind.

AI is moving fast the the big players improve their models and offerings all the time. Things are changing every month, with new models being released from many companies in the US, Europe and Asia.

I've spent a lot of time to make the AI 2.0 support in IMatch 2025 versatile and flexible.

OpenAI (ChatGPT) and Mistral are good and very affordable, which makes good arguments to start with them in IMatch 2025.
Different from the US services, the French Mistral adheres to the strict European data privacy laws, which is good news for privacy-aware users who don't want to upload their private photos to US cloud services.

I will consider adding other AI's, like Google's Gemini model or Antropic Claude when there is actual demand from IMatch users. And I will look for other uses of modern AI for IMatch. I have some ideas. I like letting the computer do boring work automatically, and I'm sure many users see this the same.


QuoteGoogle is smarter than that. If I use Google Lens in even the cheapest Samsung phone for motif recognition, I get a question afterwords to answer and if some people like myself do we will help Google Lens to improve.
From what I understand, this means that even when the AI runs "local" on your device, you agree to upload your photo and your rating to the Google AI to use it in the next round of training (else it could not learn from your feedback?). This is surely mentioned somewhere in their 200 page TOS text wall. I don't know.

IMatch 2025 also supports Ollama, which is a free software that allows you to run powerful AI models locally, on your PC. Installation is super-easy and if you have a suitably fast graphic card, you get AI keywords, descriptions and traits in IMatch 2025 without paying or uploading your personal data. Awesome.

Since all this runs in the background, IMatch just takes longer to describe/keyword your files when your graphic card is slower.

I run my tests with a mid-range NVIDIA GPU 4060TI in my workstation PC, and with the NVIDIA mobile GPU that came with my Dell laptop (about one year old). It works very well on both devices. And totally local, no internet required.
Operation times between ~2 and ~5 seconds per image, depending on how complex my prompts are.

Ollama is constantly updated and new models are released often. They support multiple models usable by IMatch, including LLava and LLama (Meta/Facebook).

Scalar

Wow, that sounds really good. Will the image descriptions later be saved in an extra database field or will they be added to an existing one?

Mario