More Artificial Intelligence...!

Started by Mario, January 21, 2018, 10:19:38 PM

Previous topic - Next topic

Mario

The AI-based machine translation in the 2017.13.x release was my testbed for some general architectural enhancements in IMatch and the 'behind the curtain' services offered by photools.com.

As expected, this rather big bang change went mostly unnoticed  ;).
Except for Mees and Josè, who thankfully translate apps into Dutch and Portuguese - and who immediately benefited from machine translation available in the App Translator.

Another testbed is the pinboard app, naturally.

Over the past weeks I've learned new things over the past weeks. Most of these things I need to understand in order to implement the major features I have planned for IMatch and IMatch Anywhere™ 2018.
I'm always thinking about ways to make the features in IMatch easier to use, to allow users to do 'more' with IMatch, save time in daily workflows etc. To get rid of tedious tasks like adding keywords to files.

There is definitely a trend for using artificial intelligence. For utilizing the vast and cheap computing power available in clouds. For letting the computer do boring tasks on its own, while we do more important things like talking photos.

The first outcome of my experiments is a versatile test app which implements support for several (!) cloud-based vision services.
With these, a user will be able to automatically (or at least, semi-automatically) add headlines / descriptions / categories / face annotations and keywords to images and videos.

It's still all a bit rough and I'm still designing an universal concept for IMatch which can incorporate all services offered by the various vendors.
I don't want to hard-link IMatch with a specific service, all have their strengths and weaknesses.

I let the app run, click the button and the app automatically makes suggestions for the currently selected image in the file window: description (or title), shows me keywords, categories, information about landmarks detected in the image, faces and even the dominant colors. Some services also return the approximate age for persons in the image, and their ethnicity.



All this information will then be make available for IMatch. In the final implementation you will be able to quickly mark the data you want to keep and copy it into the appropriate metadata or IMatch category. Or if you are confident enough, automatically import the data. Processing an image takes about 1 second that way, which is way faster than I could do it manually.

I see potential for these technologies for these areas:

- Automatically adding keywords
- Automatically adding titles and descriptions
- Detecting faces, age, gender, ethnicity and persons
- Detecting objects
- Detecting landmarks
- Detecting and extracting text, numbers, license plates, ...
- Detecting dominant colors
- Detecting ingredients of food shown in images
- Detecting "adult" content
- ...

Note 1: Of course these features are optional. No user must use them. Like many users don't use versions, the Map Panel or Design & Print. Others find these things invaluable.

Note 2: No, the artificial intelligence is not always right.
Sometimes the description is plain wrong. Or some or even all of the keywords are wrong.
But the AIs get better all the time, and I' sure I can implement a user interface in IMatch 2018 that makes this a snap to use.

Note 3: No, I cannot pay for these services on my own. I'm not Adobe.
The typical cost for processing 1000 files varies between 80 cents and 1.50 dollar. Depending on the vendor. Volume discounts available. Most vendors give you 5000 files for free, per month. Which may be sufficient already.

Note 4: Yes. I plan to make this available for free testing somehow. I'm already enhancing the computing facilities available in the photools.com cloud for that.

monochrome

This is incredibly cool!

I have played around with Amazon Rekognition for face recognition and thought about connecting Google Cloud Vision to IMatch - but haven't managed to think up a good way to handle "knowledge management" for labeling: sometimes (ok, almost always), the recognition service would come up with irrelevant labels, so you'd have to treat them as suggestions. But if the recognition improved, you'd like to re-recognize the images. On top of that, different services use different sets of labels. So, in the end, you'd want some kind of "keywords that are known good" / "keywords that Service X thinks are appropriate" and be able to manage these separately.

Definitely looking forward to what can be done with a more sustained effort than mine.

(Face recognition was so-so - the big problem, I think, is that I use it for dance performances, where the light is anything but uniform and the dancers often wear stage makeup. The one thing I remember about Amazon's Rekognition is that it would find "potted plant" in any photo taken outdoors.)

Mario

#2
I started by writing a common framework for "AI Vision" services. This allows me to handle the substantial differences between the many vendors, to deal with tags, labels, categories, faces, landmarks etc. in a unified way. I then wrote a base adapter class and adapters for the services I've tried so far. They all produce a "model" that describes the image. This model can then be used to perform a wide range of useful tasks inside IMatch  :)

There are some services which deliver landmarks, e.g. the name of a place, or house or landmark ("Houses or Parliament", "Niagara Falls", "Palace of Versailles",...).
Some services allow to categorize images, based on a predefined category hierarchy.
Some services produce often amazingly correct description of the image ("small brown dog playing with a ball in the garden", "tourists standing in front of the Brandenburger Tor in Berlin").

Some services allow to detect faces in images. This is great to automatically produce face annotations in images.
Some services also return the ethnicity per face (quite reliable), the gender (quite reliable) and the age (sometimes off by 20 years  :D)

Some services also offer face recognition. For example, Amazon AWS. Here, a face is trained by uploading some samples.  Amazon calculates  mathematical model representing the face, but does not store the photo itself. When you later process an image of the same person, AWS can tell you that this is "Uncle Bob". An app could do that, using existing face annotations etc. This plays well with a major feature that will be integrated into IMatch 2018...

Of course what a user wants to be detected in an image is configurable.
If you only want labels, you only get labels. This is also a question of cost, naturally.


What all have in common, they deliver a set of "labels" per image. This would be the main source for creating keywords.

To make these simple labels more useful, IMatch or the app would have to perform a number of additional steps, e.g. unifying the keywords or mapping the keywords into the keyword thesaurus to produce proper hierarchical keywords. But also doing cleanup tasks like applying a user-defined "skip list" for keywords which are known to be wrong or unwanted (your "potted plant" for example, or the "no person" keywords returned by one service for all images without a detected face).

The beauty is: We can do that easily in an app. Once the model for the image has been produced (by using whatever service) all the info is there. All that needs to be done is to use the endpoints offered by IMatch/IMWS to create keywords, set the title or description, produce categories or face annotations etc.

Of course many users will have no use for that. If you only shoot family motives or friends, you can add keywords and descriptions quickly using the features provided by IMatch.
Others may find it worth a few dollars to automatically add keywords and descriptions to 10,000 files in a batch.
Or to have an "index" of all the persons in their images. Or objects like cars, boats, fruit, wedding motives, landscape motives, ...
It all depends on the user. If the user does not need it, it does not interfere or cost money.

But integrating these technologies into IMatch ans IMatch Anywhere is important. I try to keep the effort on my part low, though.
It takes a day or two to learn about how something like Azure Vision or AWS Rekognition works and to write the "adapter" which links my framework to a new service takes.
But, much is learned and gained, not too much effort required.

The technology used for this could later also be used to interface with cloud-based storage, to implement "uploader" features for services like FB, Twitter, Instagram, whatever.  Even, gosh, to connect to an IMatch database which runs in the cloud... ??? ;)

herman

You did not mention it in this topic, but I recall you were investigating possibilities to work with / embed / link / ..... / Excire services.
I suspect in the end it will all come nicely together ?
Enjoy!

Herman.

Mario

I've got an updated version of the Excire library last week.

Adding Excire to IMatch is doable, but it will require quite some effort. Because I'll to write a separate executable, implement inter-process communication features (to separate Excire from IMatch for technical reasons).

Once i have my evaluation of the web-based services complete (a week or two) I will run the same set of 200 sample files against my Excire test application.
Excire does only labels, no descriptions, landmarks, faces, etc. so I will only look at the quality of keywords.

If Excire can compete with the web-based services I will have to figure out how many users want to use it.
Excire has the one advantage that it runs locally. But the accuracy of keywords is of course a lot more important so the judgment is out still.
If only a few users want this, it's just not worth the additional effort.
They will have to buy a license from Excire, of course.

herman

Quote from: Mario on January 25, 2018, 01:52:43 PMExcire has the one advantage that it runs locally.
Which (for me....!) is not just an advantage but a requirement.
No way I am going to hand over my images to one of the datamining companies out there.
I like to stay in control  ;D
But I am not sure if I will ever need such a service, categorizing my images is easy enough as it is now.
Enjoy!

Herman.

monochrome

Quote from: Mario on January 25, 2018, 01:22:02 PM
Some services produce often amazingly correct description of the image ("small brown dog playing with a ball in the garden", "tourists standing in front of the Brandenburger Tor in Berlin").

Agree that Azure Vision is quite impressive - especially the descriptive texts. Would this necessitate a new text search functionality in IMatch?

Quote from: Mario on January 25, 2018, 01:22:02 PM
To make these simple labels more useful, IMatch or the app would have to perform a number of additional steps, e.g. unifying the keywords or mapping the keywords into the keyword thesaurus to produce proper hierarchical keywords. But also doing cleanup tasks like applying a user-defined "skip list" for keywords which are known to be wrong or unwanted (your "potted plant" for example, or the "no person" keywords returned by one service for all images without a detected face).

Would it be, for example, possible to put the AI-generated description / labels / etc. in a separate property or attribute from the XMP image description and keywords? For example, I could use the XMP description as the "known good" / "blessed" description, and then have a property for "AI description" that I could use just for searching within IMatch, but would not write back to the image metadata since its quality may be unreliable.

monochrome

Quote from: herman on January 25, 2018, 02:26:47 PM
Quote from: Mario on January 25, 2018, 01:52:43 PMExcire has the one advantage that it runs locally.
Which (for me....!) is not just an advantage but a requirement.

Google's Inception V3 runs locally and is free (Apache 2.0 I think): https://www.tensorflow.org/tutorials/image_recognition

axel.hennig

When using this technology: Is it necessary to upload the image "as is" or is IMatch doing some "image shrinking" in advance. I ask, because:

- my upload speed is much lower than my download speed
- I think I've read at the Amazon website that an image should not exceed 5MB in size

Mario

#9
Quote from: monochrome on January 25, 2018, 02:29:56 PM
Agree that Azure Vision is quite impressive - especially the descriptive texts. Would this necessitate a new text search functionality in IMatch?

Not. necessarily. If you put the Azure Description into the image title / headline / description tag, or maybe in an Attribute, IMatch can automatically search it.

Quote from: monochrome on January 25, 2018, 02:29:56 PM
Would it be, for example, possible to put the AI-generated description / labels / etc. in a separate property or attribute from the XMP image description and keywords? For example, I could use the XMP description as the "known good" / "blessed" description, and then have a property for "AI description" that I could use just for searching within IMatch, but would not write back to the image metadata since its quality may be unreliable.

This could be done using existing Attributes. Or maybe via a "synthetic" tag in the photools.com namespace that only lives in the database (like, "file name" or "folder" we have already).
But I think the general idea should be to manually check the information returned by the AI and then pick the keywords you want to keep, use or edit/use the description.

I don't see that, with the current implementations, contents delivered by AIs can be just pushed into the database without at least a quick manual review.
We could do batch runs which update any number of files in the background, given them a "for review" XMP label and then the user can review them at his/her leisure to decide which information to use. IMatch has no problem with "caching" the data returned by the AI. This is always plain JSON text which compressed very well into the database. IMatch can even index / search JSON data...

mgm1965

Nice.
I wonder if the AI can offer the keywords also in Italian in addition to those in English or if I have to do a translation...

Marco

Mario

Quote from: axel.hennig on January 25, 2018, 03:17:34 PM
When using this technology: Is it necessary to upload the image "as is" or is IMatch doing some "image shrinking" in advance. I ask, because:
I run my tests with the database thumbnails (!) which are usually 300 pixels on the long edge and a few KB in size. So far this works very well for labels. I don't see substantially better results when the AI gets a 1920 pixel image.
However, larger images may be better for face detection / recognition though. A face that is 100 pixels on a 1920 pixel image is only ~ 16 pixel on a thumbnail - and this makes face detection hard.
The app uses the standard endpoints provided IMWS to get the thumbnails. To use larger files I only need to change on parameter. IMatch could hence adapt the image size to the task.

Mario

Quote from: mgm1965 on January 25, 2018, 04:04:13 PM
Nice.
I wonder if the AI can offer the keywords also in Italian in addition to those in English or if I have to do a translation...

Marco

This depends on the service used. Some services support labels in multiple languages, others don't (yet).

But there are several ways we could handle this:

a) Running the results through a machine translation (IMatch already uses this in the App Translator)

b) Using a mapping table provided by the user (you setup a lis of English -> Italian) and the app uses that to translate the keywords on the fly.
All vendors use a finite set of tags/keywords/labels, I guess around 500. It should not be too hard to maintain a mapping table for the most frequent languages.

Mario

Quote from: monochrome on January 25, 2018, 02:35:29 PM
Google's Inception V3 runs locally and is free (Apache 2.0 I think): https://www.tensorflow.org/tutorials/image_recognition
Installing, configuring and running TensorFlow locally is quite a task. Nothing you can just "ship" with a software like IMatch.
It's also a matter of performance of course. Neural networks like big systems with several graphic cards or ASICs to run the network for speed. I think we'll see more specialized processors in future PCs which can run all that locally.
Trying to make all this work is a project  :D

Mario

Quote from: herman on January 25, 2018, 02:26:47 PM
Quote from: Mario on January 25, 2018, 01:52:43 PMExcire has the one advantage that it runs locally.
Which (for me....!) is not just an advantage but a requirement.
No way I am going to hand over my images to one of the datamining companies out there.
I like to stay in control  ;D
But I am not sure if I will ever need such a service, categorizing my images is easy enough as it is now.
As I said, this is not for everyone. But it is a lot easier to utilize a cloud service than to deal with a toolkit like Excire. Excire will have to be at least as good for detecting labels as the cloud services for me to consider it. It gets bonus points from me for working locally, it gets minus points for making more work for implementation and maintenance. In a few weeks I'll see how this all works out.

If you manage to keyword your files yourself (and IMatch is really good at that) you're fine. No need to spend money on Excire or anything. If you have Lr you can see what you can achieve with Excire with their free demo.

Mario

Quote from: axel.hennig on January 25, 2018, 03:17:34 PM
When using this technology: Is it necessary to upload the image "as is" or is IMatch doing some "image shrinking" in advance. I ask, because:

- my upload speed is much lower than my download speed
- I think I've read at the Amazon website that an image should not exceed 5MB in size

I've checked. Using the 'Small' image variant (800 pixel, longest edge) seems to be sufficient for all services I have tested. This means that per file the app has to upload 60 to 100KB.  Some services require several calls (one for labels, one for faces) and this means the image has to be uploaded twice. But even in that case and over a slow line 100 to 200 KB should be OK...? If no face detection is needed or the "objects" are fairly large, even a thumbnail (20K) works.

axel.hennig


monochrome

#17
Google's "inception5h" network in the TensorFlow samples uses 224 x 224 images. (That's what the network was trained on, and that's what all images are reduced to before classification.)

Mario

Quote from: monochrome on January 25, 2018, 11:34:11 PM
Google's "inception5h" network in the TensorFlow samples uses 224 x 224 images. (That's what the network was trained on, and that's what all images are reduced to before classification.)

Problem with that is face detection. If you have a group of people and you reduce the image to 244 pixel, each face may be reduced to 10 or 20 pixel. Very hard to detect faces in that blurry mess.

monochrome

Quote from: Mario on January 25, 2018, 11:38:52 PM
Quote from: monochrome on January 25, 2018, 11:34:11 PM
Google's "inception5h" network in the TensorFlow samples uses 224 x 224 images. (That's what the network was trained on, and that's what all images are reduced to before classification.)

Problem with that is face detection. If you have a group of people and you reduce the image to 244 pixel, each face may be reduced to 10 or 20 pixel. Very hard to detect faces in that blurry mess.

Yes, face detection is a bit of a special case where you do one pass to find the faces, then crop to identify each face individually - my point is really that these feature detectors don't work on high-res images.

Mario

#20
No. My tests show that the 300px thumbnail IMatch maintains for each file is good for almost all purposes.
For face detection and group photos or with people in a distance, a larger image is required. When I choose the default IMWS "Small" size (which is 800px for IMatch) the results are as good as it gets.

The AI connector framework I have developed for my tests (and potential future integration into IMatch Anywhere and IMatch) supports this and has a configurable image size. This could late be made user-controllable via the app. Users could do the bulk with quick and small thumbnails and run only critical or unsuccessful images again with the larger file.

For some specialized detectors (food, travel, ...) (offered by some providers) a larger image is usually also better. But that is of relevance only to food / stock / product photographers).

Face detection via the cloud services is sometimes better than the built-in detection in IMatch, and sometimes it's worse. I need to do some extra test to see if it worth to do thiw via external providers or if what we already have is good enough.

Face recognition is yet another business. To make this work, the cloud service has to extract the faces from images, IMatch has to associate the faces with a name to build some form of "collection" in the cloud database. Then the cloud service is able to recognize faces on new images. This works quite well, for frontal faces. But it is expensive. All cloud vendors which offer this want money for the face detection and also for keeping the "collection" stored on their server. Per month (as long as you need the collection).

This is not overlay expensive (Microsoft charges 0.50 $ per 1000 faces per month) but for the typical IMatch user this will be already a "no-go". It may be interesting for parts of the usage base so I won't rule out that IMatch will support this via an extra app or custom module some day.

I think that automatically classifying an image and adding keywords is the most useful feature for a large share of IMatch users.
Especially in combination with IMatch features like the thesaurus, categories. Before I sit down and add keywords to 50,000 files I'd rather spend 50 dollars once to make an AI do it  ;)

mastodon

Sorry, but I don't understand. IF Microsoft charges 0.50 $ per 1000 faces per month, means that if I have a database with about 50.000 pictures that has about 4 faces than I have about 200.000 faces to detect. So, if I let MS to tag them it will cost me 100$?
Well, as these means of 40 years of my photos, it is a reasonable fee for that. Certainly, I have to help MS software to learn faces. Can MS software use the info of photos with tagged faces for its analysis/detection?

Mario

See https://azure.microsoft.com/en-us/pricing/details/cognitive-services/computer-vision/ for pricing details.
The privacy conditions for Microsoft's Cognitive Services (including computer vision) are here: https://www.microsoft.com/en-us/trustcenter/cloudservices/cognitiveservices

All vendors offering AI or computer vision services have similar prices and policies. Amazon AWS being a bit of an exception (from what I can tell) because they only store a mathematical model of the images you plan to use for face recognition, not the image itself. I have not yet read their privacy statement in detail. Or Google's...

For your use case:

I expect one has to upload a number of images of each face to improve the detection quality. In your case, with only 4 people, you probably need 5 or 10 images per person for best results.
This means your monthly charge for storing the sample image collection will be about 1 dollar. For as long as you want to be able to recognize faces. You can of course upload the sample files later again to save cost.

Microsoft charges for face detection in batches of 1000 files, at one dollar per batch (independent from the number of faces detected).
50,000 files means thus 50 US$.
If you do it all in one month, you're out with about 51,- US$.
If you also want automatic category/keyword detection, this will be add another 50$.

so

101,- US$ for 50,000 files with keywords, face annotations and people classification.
And a certain failure rate, of course.

or

21 US$ for 10,000 files with keywords and face tags.

All services I have tried to far cost about the same. Plus or minus a few cents per 1000 files.

Not for everyone, surely.
But it can save weeks of time for many. And there is also text recognition, sentiment, (happy, sad, laughing, smiling, ...) that is provided by some vendors.

Jingo

If these services become popular - then prices should ultimately start to come down.... I can see a flat $9.95 per month with year sign up as a reasonable expense for automatic keywording and recognition... but don't see widespread use of this unless the process is quick and seamless.  As it is - the majority of "normal" families who take photos don't even transfer them to a computer let alone catalog and keyword them.

Mario

QuoteAs it is - the majority of "normal" families who take photos don't even transfer them to a computer let alone catalog and keyword them.

Sure. But these are not DAM users and hence not my target audience.
If you upload your files to Google / Amazon / etc. they probably let their AI run on these files - and may even offer you the results.

Jingo

Quote from: Mario on January 28, 2018, 02:37:50 PM
QuoteAs it is - the majority of "normal" families who take photos don't even transfer them to a computer let alone catalog and keyword them.

Sure. But these are not DAM users and hence not my target audience.
If you upload your files to Google / Amazon / etc. they probably let their AI run on these files - and may even offer you the results.

Understood - but just think how large that audience is!  Wonder what type of marketing can be done to attract them to photo organization... pretty sure auto-keywording would be a good starting point to attract a larger "untapped" audience.

Mario

#26
I guess the target audience is huge - if this would be included for free in IMatch.
Unfortunately, this is impossible. I cannot pay for this from my own pocket.

My plan is to offer the features and enable the user to choose the service they want and deal with them directly.
The smaller vendors like Clarifai or immaga make this fairly 'easy' and affordable, even for normal people.

The big vendors like Google, Amazon or Microsoft target their products at corporate user and developers. Although they are very good it's way too complicated for normal people to use their services.

One way to solve this would be for me to integrate this as a "paid service" into IMatch.
A user then could buy, say, credits to add keywords and face annotations to 5, 10 or 20K files. The corresponding features in IMatch would become available until the credits are used up.
This would be very simple for users, and I could extend the photools.com cloud to handle this.

I have the infrastructure in place, at least on a level that will allow me to give a selected group of users access to the AI features for testing.
The outcome of that will allow me to decide whether or not this is worth the effort.

The results are truly awesome in many cases. Adding keywords, landmarks, people, and even text recognition opens a wide range of possibilities. Especially when integrated with existing IMatch features like categories and collections or the thesaurus. My test suite produces amazing results on large catalogs with untagged files - from keywords to precisely telling me which photos I took in Paris, London or Barcelona. Even which buildings I photographed. Yay! Or all images I took with motorcycles, cars or fruit...  ::) ;)

Jingo

Quote from: Mario on January 28, 2018, 11:40:52 PM

One way to solve this would be for me to integrate this as a "paid service" into IMatch.
A user then could buy, say, credits to add keywords and face annotations to 5, 10 or 20K files. The corresponding features in IMatch would become available until the credits are used up.
This would be very simple for users, and I could extend the photools.com cloud to handle this.

I have the infrastructure in place, at least on a level that will allow me to give a selected group of users access to the AI features for testing.
The outcome of that will allow me to decide whether or not this is worth the effort.


Yes - that would be an interesting solution... the easier it is - the more users will use the functionality.. and hopefully - more users will join the fun!

Mario

Quote from: Jingo on January 29, 2018, 01:25:04 PM
Yes - that would be an interesting solution... the easier it is - the more users will use the functionality.. and hopefully - more users will join the fun!

Yes. But then I would take the all the risk and hassle with payments. If a user is unsatisfied. If a user wants his money back. If the vendor does not fulfill his SLA's or goes bust...

I would need to setup a separate server for this in the cloud, to route the traffic, do authentication, check quotas etc.
This generates additional monthly cost and needs to be carefully calculated. I'm still in the thinking process.

A first test of the infrastructure will be the "AI Beta" I will run here in the community soon - with maybe 5 or 10 testers who get access to the tagging/face/description features to see how it works. Since I pay for that with my own money I will restrict access and use a quota for each user (only a few hundred files per tester).

I've almost completed the integration, just waiting for one vendor to get back to me with some answers.
The I do a test with 100 files or so and publish the results. This will allow everybody to see what can be achieved for different image types - and for each supported vendor.

Jingo

Quote from: Mario on January 29, 2018, 03:26:53 PM
Quote from: Jingo on January 29, 2018, 01:25:04 PM
Yes - that would be an interesting solution... the easier it is - the more users will use the functionality.. and hopefully - more users will join the fun!

Yes. But then I would take the all the risk and hassle with payments. If a user is unsatisfied. If a user wants his money back. If the vendor does not fulfill his SLA's or goes bust...

I would need to setup a separate server for this in the cloud, to route the traffic, do authentication, check quotas etc.
This generates additional monthly cost and needs to be carefully calculated. I'm still in the thinking process.

A first test of the infrastructure will be the "AI Beta" I will run here in the community soon - with maybe 5 or 10 testers who get access to the tagging/face/description features to see how it works. Since I pay for that with my own money I will restrict access and use a quota for each user (only a few hundred files per tester).

I've almost completed the integration, just waiting for one vendor to get back to me with some answers.
The I do a test with 100 files or so and publish the results. This will allow everybody to see what can be achieved for different image types - and for each supported vendor.

Sounds good Mario... will be interesting to see how this test goes.. happy to help of course!