[Official] What's Berewing for IMatch 2024 / 2025. You will like this!

Started by Mario, May 27, 2024, 11:25:58 AM

Previous topic - Next topic

Mario

I always write 2024/2025 because I currently don't know when I will consider the next major release finished, stable and ready to release :)

What's Brewing...

One of the major things I work on for the next major IMatch release is to enable you to do more with less work. And a big part of that will be to improve AutoTagger AI in IMatch. Automatic keywords, automatic descriptions and other features which automatically organize your images and other files. All running locally on your PC, without cloud.

AI technology moves very fast, and there are several cool free open source projects which allow developers to integrate AI into their apps and software. Microsoft is also adding more and more AI to Windows itself, and, as far as I can tell, they do that in a way that Windows applications like IMatch can utilize. We'll see.

Sadly, I cannot train my own AI models for IMatch, because - money!
Training an AI model from scratch costs between 1 and 20 million US$, considering computing resources alone. Yikes!

But what I can do, is to interface with publicly accessible open source projects and integrate them seamlessly into IMatch. This allows you to use the latest crop of open source AI models (or AI tools integrated in future Windows versions) in IMatch.

Using features you already know, like the AutoTagger, the Keywords and Metadata Panel, and probably data-driven categories based on information the AI can extract from your images automatically. Again, running locally on your PC, no cloud. Less typing, better search and automatic organization for all your files, for free.

See What the New AI in IMatch Can Do Right Now!

To give you an impression on what I'm working on, I've made a short video (2 minutes)

https://www.photools.com/bin/imatch-ai-2-001.mp4

This video shows how the "IMatch AI 2.0" automatically adds descriptions and keywords to images with a variety of motives.
Before you ask: Yes. There are options to influence the wording and phrasing used, text length, number of keywords generated per image etc. ;)

Help Wanted

I have an almost finished prototype app which does what you see in the video.
It is a test bed, designed to experiment and figure out which "prompts" (questions to ask the AI when I want it to describe an image) produces the best results.

It is a learning process for me. I know that different prompts work for different types of motives.
If you want the AI to describe images taken during your last vacation or city trip, you can tell it to. Or if you want the AI to describe photos of cats, dogs or birds. Or landscapes. Industrial images. Weddings. Food. Abstract motives.
For example, adding context like "These images were taken in Madrid, Spain" will improve the description and keywords considerably.

This is where I need your help. Playing with the new AI app in IMatch, trying different prompts and see which prompt produces the best results for your images.

I'm looking for 5 to 10 users who have a few hours a week to play with this and share results and provide feedback with the other participants and me. We'll do that in a closed group to not disturb the regular flow of this community.

Let me know what you think about the video.
And if you can spare a few hours to try this out on your PC and with your images.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mopperle

Mario, how would this test technically work, can you describe this a bit in detail? Separate installation, working with a test DB.... Maybe I am interested.

Mario

You can see the results of the AI for your files and prompt with the click of a button
This does nothing to your database, unless you actually click the button to write the keywords / descriptions into files.

To make things easier to see in @Keywords, the test app produces keywords under the common root keyword AI.
In the final version, all this will be integrated into AutoTagger and the normal thesaurus lookups, black lists etc. will work like they do with the current IMatch AI and the external services.

You can use test files of course. Just copy together a bunch of files you want to test into a folder and work with that.
This is how I work, with a couple of thousand files. Easy to remove all keywords and descriptions via the MD / KW Panel and start fresh.



-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mopperle

OK, I've understood. The complete idea and the video looks really promising, so I would be in for testing.

evanwieren

Hello... I recognize that I am a very new user. However, as a developer myself, and having followed your product for some time, I am interested in helping out. A few hours a week is something that I can do. I would be keen to learn how you are applying the AI and what tools you are ultimately developing on top of.

sinus

Mario,
I think you can judge my "technical" skills quite well. 
If you think I can help with that too, then I'm happy to help.
 
Unless the sky is falling, I should be able to spare some time each week - especially as it's an interesting topic.
Best wishes from Switzerland! :-)
Markus

Lincoln

Happy to help if I can. Things are certainly changing fast in this field so would be a good experience for me to help understand it a bit more. Is there the possibility to "train" different subjects? In my case Australian content. Most of the "keywording" software I have tried doesn't read the Australian content at all like the AutoTagger app only gives one keyword/tag for an image of the Sydney Opera House!

Mario

QuoteIs there the possibility to "train" different subjects?
Yes. You can retrain / extend existing models.

But that can be quite complex. For example, if you want to train the AI to recognize specific buildings or landmarks, you would have to prepare a large number of tagged images for each building, with regions outlining the building / landmark, tags etc. in the specific format your model and tool chain requires. And a lot of tooling (software) installed, Python, PyTorch, a very fast graphic card or even graphic cards etc. Not exactly doable for "normal" people like us.

I'm sure enhancing existing AI models, transfer learning etc. will become easier in the future. I keep my eye on this of course.
But the training data must be there, which always requires a lot manual labor.

Google and the other big companies with deep pockets do all that by paying thousands of low-wage workers using platforms like Amazon's Mechanical Turk and similar. They basically add regions and tags to objects in images all day - to produce training material.

In addition, Google and others let's users do a lot of work for free.
- photo upload to Google Maps => free training material for Google's AI.
- solving Google Capcha's ("click on all bicycles") => free training material for Google's AI.
- "tag your friends" on Facebook => free training material for Meta's AI.
...

Cloud vendors like Clarifai, Google, Amazon allow subscribers already to train their own models - at a substantial cost.
I'm sure smart people will make the same work, affordable and doable for "normal" users over time.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

axel.hennig

Maybe I could also support, but:
  • My english is not the best one (not sure if this is relevant).
  • I have more or less no experience in "prompt" writing.

Mario

Quote from: axel.hennig on May 29, 2024, 12:48:09 PMMaybe I could also support, but:
  • My english is not the best one (not sure if this is relevant).
  • I have more or less no experience in "prompt" writing.

Prompts have (currently) to be written in English for the model I use for the purpose.
You can always, and I would consider this an interesting and useful test, write your prompt in your native language (there are many examples for how to write prompts on the Internet) and then e.g. use DeepL.com to translate it into English and see what description and keywords the IMatch AI produces for your files.

Descriptions and keywords are currently only generated in English. I could not find a machine-translation model that can be run on a local PC. For now. Thanks change fast and all that.

Feeding the output of the AI (description and keywords) into a cloud-based translation AI like DeepL.com would be definitely a big improvement for users who don't use English for their images.
At least for keywords, we need to translate them only once and IMatch could maintain a list of already translated keywords. This would reduce the cost. For descriptions, though, this would not work.

I would use the same approach as for the Map Panel / reverse geocoding. Users can enter their DeepL.com credentials (they have a free tier) and then use that to let the AI-generated translation translated by DeepL.

The same could work for prompts, I guess. We will have some basic prompts which work for a variety of motives and the user can extent that when needed with a "context" in English (maybe auto-translated from their native language).
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mopperle

When will you start the test? I'm currently building a test DB with some thousand files with very mixed content.

Mario

I will start the test when everything is ready. This post is only two days old. Only a few users have seen it yet.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mopperle

Does the Ai model also work with video files, based on the thumbnail created by IMatch?

Mario

Like the current face recognition Auto-Tagger, the thumbnail is used for video files. The user controls the thumbnail and has thus the pick. You can select the thumbnail for video files (from the set or thumbnails IMatch has extracted) using <Shift+><Ctrl>+<PageUp>/<PageDown>.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mastodon

That is very promising. As a family user, a much better face recognition would be even better. Even if it would be a plugin for a fee.

Mario

Quotea much better face recognition would be even better.
The face recognition already integrated in IMatch is one of the best systems available and ranks very high on the standard benchmarks.

If you have problems with it, check your trained faces. Don't overtrain. 3-5 trained faces is usually more than sufficient.

The AI we're taking about here does not do face recognition at all. It's a model designed to detect objects in images to create descriptions and keywords.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

mastodon

I know, that this is not about FR. Yes, IMatch quit good at FR, but many times it does not recognise a face as a face, mostly from aside (profil). So, it is good, I like it to be better. (This is just a note :)

Mario

Quotemostly from aside (profil).
This is a top challenge, even for police and military face recognition technology.
You'll have to use a highly specialized model trained to recognize faces from the profile, which basically removes 50% or more of the face landmarks AI's use to detect faces in images or video footage.
You may be able to do this for persons you know.
But for general AI face detection, very hard. And such a specialized model would probably produce tons of false positives for frontal faces.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook