IMatch Computer Vision Results are in

Started by Mario, January 30, 2018, 11:19:47 PM

Previous topic - Next topic

Mario

Over the past two weeks I've analyzed the possibilities of incorporating modern AI-based computer vision into IMatch.
To automatically add keywords and descriptions to images, detect faces, objects, text etc.

Please see this new blog post for more information and results:

https://www.photools.com/5742/computer-vision-imatch-initial-results/

ianrr

Could the AI version be used in conjunction with a separate "my personal" interpretation when one finally gets around to adding ones own interpretation  ?  to get a personal needs information only.

Or could a check box be added to not show data from AI irrelevant to ones needs ?

How far down the track (approx obviously)  before using this feature are we looking at ? weeks/months/years ?

Certainly interesting, as I have over 12 years of images and based on me putting the info, the chances are close to zero of it happening from me putting it in manually.

Thanks for the current info.  Love your work !!!

ian


jch2103

The results are very interesting, both in terms of the commonalities and the differences among the vendors. I can see that it would be almost essential to have an up to date, compatible thesaurus when choosing one of the vendors. Just about as essential would be a way to control the volume of returns. They all produce an astonishing number of hits. One fascinating side note is that none of the vendors was able to accurately identify a slice from a sushi roll!

John

sinus

Very interesting, yes.
But of course, if I would do this, I must look over all images and check the keywords, because quite a lot of them are false.
Depending on what we have to do and want achieve, this way could be the best way.

But for others still the manual keywording could be the best. Not to forget, tools like IMatch with its Thesaurus help us a lot to keywording several images quite quick.

But in the context a simple question:
What about other languages then English? What should I do, if I wand German keywords or Norwegian, Spanish and so on?
Best wishes from Switzerland! :-)
Markus

Mario

Quote from: ianrr on January 31, 2018, 12:02:30 AM
Could the AI version be used in conjunction with a separate "my personal" interpretation when one finally gets around to adding ones own interpretation  ?  to get a personal needs information only.
Adding keywords via AI features is an option. It complements your own input. Or you use it on a file-by-file basis, picking the results you want to keep.

Mario

Quote from: jch2103 on January 31, 2018, 01:34:37 AM
The results are very interesting, both in terms of the commonalities and the differences among the vendors. I can see that it would be almost essential to have an up to date, compatible thesaurus when choosing one of the vendors. Just about as essential would be a way to control the volume of returns. They all produce an astonishing number of hits. One fascinating side note is that none of the vendors was able to accurately identify a slice from a sushi roll!

IMatch would offer an app that not only allows you to limit the keywords to a certain confidence threshold. It would also present them in a "pick list", allowing you to quickly select the keywords you want to apply.

I think that features like "always ignore" to suppress certain keywords, "replace A with B" functions to replace certain delivered keywords will be needed. Plus thesaurus lookups for getting hierarchies and synonyms automatically.

The tags returned by the AI serve as a good base, but combining them with the advanced IMatch features and a clever app will make them really useful.
This technology is surely not perfect, and never will be. But they get better all the time.

Some vendors offer to train your own networks, which may be a way to get much better results for specific image domains. Clarifai offers custom models for food,apparel,travel and wedding out of the box (not used for my test).

Some of my sushi images are correctly identified as sushi,  but not all.  If I switch Clarify to use the "food" model, they detect sushi more reliably and even return the main ingredients!
In then delivers for the sample sushi photo in the test set:

sweet, cake, cream, chocolate, fish, pastry, candy, meat, dairy product, seafood, ice, milk, goody, bread, salted fish, plaice, pie, shellfish, salt, vegetable


Mario

#6
Quote from: sinus on January 31, 2018, 08:38:48 AM
What about other languages then English? What should I do, if I wand German keywords or Norwegian, Spanish and so on?

Many vendors offer tags in different languages. The app would of course allow you to select the language.

For other vendors we could use a simple en => <other language> translation table. Such a table could be filled by the app on-the-fly using machine translation.
If a new keyword is encountered the app uses IMatch's AI machine translation to translate the keyword and then remembers the result for the future.

The user would of course be able to edit this table and the same table could be used to always replace a keyword with another keyword, suppress keywords or even replace a keyword with multiple keywords (similar to synonyms). This would allow the user to fine-tune the automatic tagging process to improve quality and results.

sinus

Quote from: Mario on January 31, 2018, 09:24:58 AM
Quote from: sinus on January 31, 2018, 08:38:48 AM
What about other languages then English? What should I do, if I wand German keywords or Norwegian, Spanish and so on?

Many vendors offer tags in different languages. The app would of course allow you to select the language.

For other vendors we could use a simple en => <other language> translation table. Such a table could be filled by the app on-the-fly using machine translation.
If a new keyword is encountered the app uses IMatch's AI machine translation to translate the keyword and then remembers the result for the future.

The user would of course be able to edit this table and the same table could be used to always replace a keyword with another keyword, suppress keywords or even replace a keyword with multiple keywords (similar to synonyms). This would allow the user to fine-tune the automatic tagging process to improve quality and results.

Thanks, Mario, but please stop now!  8) :o :P :'(

Sounds simple toooooooo cool and if I think at the hours, days and weeks, where I had added keywords, I feel  ::) :-X   :'(

Tsss, I wonder, where we will be with all this digital stuff in 10 years?
I would even dare to think, that your sentence in a post above "This technology is surely not perfect, and never will be." could one day be wrong ... because it is simply perfect.
Best wishes from Switzerland! :-)
Markus

Mario

Quoteif I think at the hours, days and weeks, where I had added keywords, I feel  (...)

Precisely. Adding keywords to a large number of files is usually tiring and boring. Nobody does really like to do this.

If you have large batches with basically the same motive, you can do it manually quickly. But if you have many different motives, you need to identify the people of objects on the photo etc. it can become a real time sink. If your computer can do this for you, great. More time for other things, like creating more photos or having a walk or a chat  :)

QuoteI would even dare to think, that your sentence in a post above "This technology is surely not perfect, and never will be." could one day be wrong ... because it is simply perfect.

The verdict is still open. But when I look at how fast the AI technology advances, they may be better than humans when it comes to recognize the content of an image. Our brain is superb at this kind of task. But it is "just" a highly developed neural network. And modern AIs are based on the same technology...this will be interesting.

I like taking photos. And I like my photos to have proper keywords, categories, descriptions. Because this allows me to find them quickly and to effectively use my collection. But if a computer can do the boring task  of adding keywords, face tags, categories to my files - great. This gives me more time to do interesting things.

I will not hard-wire AI or 3rd party services into IMatch or IMatch Anywhereâ„¢. But I want to offer my users access to these technologies. Because, as you said, it can save you days or even weeks of time.

monochrome

Will it be possible to plug in other providers via the IMatch Application framework?

I'm experimenting with retraining Google's "Inception v3" architecture ( https://www.tensorflow.org/tutorials/image_retraining ), and it would be very cool to be able to run the image recognition locally. It would, for example, be possible to use existing keyword assignments to produce a neural net that specifically classifies according to those - or have specialized neural nets for different super-categories of keywords. Bottom line is that it would make it possible for each user to have their own specially-trained net.

(Of course, this can all run as a command-line app, but with the additional stuff you're adding, like ignore lists and such, it would be nice to plug into that.)

Mario

#10
If this is available as a web service...

My framework is designed to utilize cloud services. But if you can work with Tensorflow you can surely write a small IMatch script that pushes the keywords into the database, right?

I doubt that more than a handful of IMatch users will be able to install, maintain or run Tensorflow locally. Besides, I'm not overly impressed with the results I've seen from Tensorflow yet, in the IMatch context.

I think that using cloud services is the way to go. They can run their AIs on purpose-built hardware, on entire racks of GPS. Even I now rent hardware as needed in the Azure cloud. I don't bother with local test PCs anymore. I run four to six Windows machines for a few hours for testing and then I turn them off again. Costs me a few dollars per month.

Vendors like Clarifai or imagga allow me to train my own models and host/run them for a few dollars per month.

monochrome

Quote from: Mario on January 31, 2018, 04:16:59 PM
If this is available as a web service...

Well in theory the IMatch app could launch a web server that the IMatch app could then use, but now we're kind of tying ourselves in knots, so: No.

Quote from: Mario on January 31, 2018, 04:16:59 PM
My framework is designed to utilize cloud services. But if you can work with Tensorflow you can surely write a small IMatch script that pushes the keywords into the database, right?

Sure. That would be doable.

Mario

I've had a look at Tensorflow but it seems to be quite complicated to get running on Windows. I doubt this will be something a normal user can do.
I could run TF on one of my cloud services and build my own vision API, though...if my day would have 96 hours  ;)

Jingo

This is all so interesting.. and there are TONS of services out there for sure.  While it only works on a single image at a time - even this freeware "experiment" http://www.akiwi.eu returns good results that can quickly build a list of photos to assign back to multiple images in IM... of course - they are all flat keywords so not ideal... but the technology is advancing very fast!!

ubacher

I think dealing with the many keywords returned will be a challenge. Manually reviewing each set will likely not
be a time saver over assigning from scratch.

Are probability values returned for each keyword? If so, we might need to think about some kind of fuzzy logic based
selection of images. And this would require keeping probabilities with the keywords/categories we store.
That would be a new kettle of fish really.


Mario

Yes, each returned keyword has a probability value. I just don't show them in the PDF.

I'm 2 to 3 times faster reviewing keywords assigned by the AI than selecting them from the Keyword Panel.

monochrome

Quote from: Mario on January 31, 2018, 04:26:46 PM
I've had a look at Tensorflow but it seems to be quite complicated to get running on Windows. I doubt this will be something a normal user can do.

If it's the C++ / GPU version, then that might be complicated. But if you're fine with the CPU version and the Python API, then it boils down to how to distribute a Python installation.

I kind of feared it would be a pain, but it was actually pretty simple to get everything running.

Oh well, I'll let you know if there's any progress.

Mario

Have you trained a larger net without GPU support? How long did it take?

mastodon

I like the results of Google. Actually, I see that descriptions made by Azure are quite good, but I am not sure, they are not enough specific for my family photos. But it might be lifesaving for universal photos.
Does AI-s use tages tags (especially face tags) that are written in JPG files? That would be speed up learning.

Mario

I have no information about the AIs used embedded metadata in images.
The thumbnails my test app uploads do not contain any metadata.

If your files already have face tags and these were added in IMatch, you already have the corresponding keywords in your database.

monochrome

Quote from: Mario on February 01, 2018, 11:16:35 AM
Have you trained a larger net without GPU support? How long did it take?

Don't really know what "larger net" means here, but: Transfer learning ( https://www.tensorflow.org/tutorials/image_retraining ) on about 3600 images of flowers for Google's Inception V3 net is about 30 minutes on an Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz. Results in about 90MB of retrained net with 97% accuracy on the test set.

Mario

Sounds OK for retraining the final layer of a previously trained net. I have not worked with this so I don't know how adaptive the model will become for varying inputs. Probably worth the time to do some experiments, when there is time.

monochrome

It trained kind of OK for flowers at least...

Bottom line is that I'm not all that convinced that this will be all that useful for me.

I started off thinking that I could train a net to categorize photos for me; but pretty much about now I realize that I don't take pictures of "things": it's more important to me that, for example, something is categorized as "Place des Anges by Gratte Ciel" than "@Keywords|show" "@Keywords|dance" "@Keywords|performance". People names are important, but not "@Keywords|people" and "@Keywords|person" - and really, no net will figure that one-off out.

I guess if I sold photos it would be different. If I had hundreds of photos of different foods I could train it on, it could be cool.

Was kinda fun to play around with.

Jingo

My feeling as well.. I would also be curious how it handles "non-stock" type photos... I mean - the stock photos used in the test PDF files are all "pro level"... great lighting, subject isolation, color saturation.  However, how about messy "normal" folk photos.... family gatherings, baseball game, kids concerts photos from 200 feet away... I have a feeling these images wouldn't do as well.  Also starting to wonder how effective this is for non-pro/non-stock type images.  Like - what would it do with this:


sinus

Quote from: Jingo on February 02, 2018, 02:35:41 AM
My feeling as well.. I would also be curious how it handles "non-stock" type photos... I mean - the stock photos used in the test PDF files are all "pro level"... great lighting, subject isolation, color saturation.  However, how about messy "normal" folk photos.... family gatherings, baseball game, kids concerts photos from 200 feet away... I have a feeling these images wouldn't do as well.  Also starting to wonder how effective this is for non-pro/non-stock type images.  Like - what would it do with this:



I think, also for "stock-type-photos" as well, that each has an own system.
In your case I would first give some "overview-keys", like in a film, if you do first take an overview with an wideangle and then go nearer and nearer and deeper.

And this AI-stuff is more for the nearer-keywords. But I am of course sure, that in the near future we will do keywording and even texting a lot with AI.

In your case first the country, the city, the overwiew like "landscape", then maybe "historic" then "nearer" keywords like wood, ruins, stone and so on.
Then it depends, I know some collegues would add here also sky, but I would not.

At the end I would see somehow a kind of "melting" between AI and human keywording (or monitoring).
I do not know, if we could this a bit compare with text - recognition or/and translating: it will be better and better, depending on the text, it is super, then very bad. Also here the human has to monitoring the whole text.
You will not, usually, let an English - text translate in German and then put it online. You would first monitorin the translated text and edit it, before you put it online.
Best wishes from Switzerland! :-)
Markus

Mario

All AI vendors allow you to try out their services via their web sites.

For your sample file, I've got these results with my test app:

imagga:

fortress, sky, landscape, tourism, mountain, cape, travel, sea, water, tree, hill, tower, architecture, coast, rock, ocean, forest, landmark, castle, city, clouds, scenic, panorama, beach, building, summer, river, trees, vacation, cloud, cliff, mountains, church, promontory, horizon, outdoors, village, structure, palace, town, island, scenery, shore, bay, coastline, famous, boat

Microsoft:

Description: a close up of a hillside next to a body of water (0.8058409422144833)

outdoor, water, nature, grass, grazing, field, mountain, background, green, grassy, hill, lake, hillside, large, pasture, standing, body, lush, tall, group, cloudy, river, horse, rock, herd, white, tower, city, ocean, sheep, eating, beach, tree, sky, distance

Google:

hill station, sky, highland, archaeological site, historic site, tree, hill, biome, mountain, national park

Clarifai

nature, architecture, travel, landscape, tower, sky, panoramic, hill, landmark, sight, old, tourism, tree, beautiful, mountain, summer, outdoors, building, ancient, castle

Amazon

Flora, Forest, Land, Nature, Outdoors, Plant, Rainforest, Tree, Vegetation, Architecture, Beacon, Building, Lighthouse, Tower, Housing, Monastery, Bush, Jungle

monochrome

Having given it a good think, I'd say that I'd like two "kinds" of keywords. One set is the one I assign myself - maybe with AI help. They are known to be good. But then there are approximate keywords assigned by AI to images with no assigned keywords.

The results from Mario's test app look like stuff that are passable, of low quality - but good enough to pop up in a search below images that match on manually assigned keywords. For example - in the rainforest ruin photo here some of the services matched against "water" or "sea" (which I can see why),  and that is wrong. But if the photo came up in a search for "water" below images that I had manually assigned "water" to, that wouldn't be so bad.

I tested a photo of a flock of birds, and Amazon returned "boat". If that were given as much weight as my own keyword of "birds", and the photo showed up on top in a search for "boat", that would be a bad experience. But if it came up after all known-good boat photos, I'd be like "yeah that flock of birds kinda looks like a boat, huh".

Arthur

I have a fix hierarchical keyword thesaurus, which should not be extended automatically by any AI. This keywords are like a defined rule set, which is only extended by me.

On the other hand the AI proposed keywords could be saved in a separate set, which could be used to simulate a "Google image search" on your data base. I would not even filter a AI proposed set and store it as whole, but not in the same set as my rule based keywords.

Searching by a well defined keyword based rule set and searching in the AI proposed universe are two completely different use cases from my point of view:
- The rule set approach allows concrete searches like, give me all photos with "Person: 'A' on Event: 'Birthday Party of Person 'B' in year: '2017'". There is information in it, which cannot be deduced from the image content.
- The AI based approach simplifies searching for abstract things like "children AND beach" or "dog AND cat AND blood" :-) without having the overhead of manually assigning the keywords. It is an alternative, but not a replacement to the rule based approach.


monochrome

Quote from: Arthur on February 02, 2018, 02:34:17 PM
I have a fix hierarchical keyword thesaurus, which should not be extended automatically by any AI. This keywords are like a defined rule set, which is only extended by me.

Same here.

Quote from: Arthur on February 02, 2018, 02:34:17 PM
Searching by a well defined keyword based rule set and searching in the AI proposed universe are two completely different use cases from my point of view:
- The rule set approach allows concrete searches like, give me all photos with "Person: 'A' on Event: 'Birthday Party of Person 'B' in year: '2017'". There is information in it, which cannot be deduced from the image content.
- The AI based approach simplifies searching for abstract things like "children AND beach" or "dog AND cat AND blood" :-) without having the overhead of manually assigning the keywords. It is an alternative, but not a replacement to the rule based approach.

Precisely. I'm of the opinion that you can do the two things in one operation - for example, show all the "rule set" matching images before the AI-matching images - but yes, the above is pretty much where I'm at as well.

monochrome

Hm. This thing with a local image recognition tool may have legs after all...

Quote from: Arthur on February 02, 2018, 02:34:17 PM
I have a fix hierarchical keyword thesaurus, which should not be extended automatically by any AI. This keywords are like a defined rule set, which is only extended by me.

This is something a local, custom, net can help with. Here is a screenshot of a neural net that classifies dance styles. Put a bunch of photos in categories (you can see the hierarchy in the screenshot), used that to train the network (took about 10 minutes), and it can now add more photos to those categories.

I still think that when searching, "AI-assigned label matches" should come after "human assigned label matches". But, of course, the latter can be nothing more than clicking "Approve" for AI-suggested categories.

It's really down to how the UI is set up to give maximum workflow benefits.

ubacher

I looked at Jingo's picture and thought how I would look for it.
I figured I would use the words ruin forest building
Looking at the results only imagga and Amazon would have returned the picture but only if I had specified OR

I might conclude that using the keywords as returned will not be very useful unless we have a search which also considers related terms
(not sure for the technical term for such a search). Such a search would take tree as close enough to forest, tower the same as building etc....



Jingo

Its funny because it returned so much more than I have already keyworded the image for... which isn't a bad thing... just not sure how to handle hierarchical keywords for NEW items ... like "promontory" which was returned and may actually apply... but I don't have that in my keyword list. 

Its a pretty interesting technology... as I research it further - I'm finding TONS of companies out there with API's and interesting concepts.. 

hluxem

I have a fix hierarchical keyword thesaurus, which should not be extended automatically by any AI. This keywords are like a defined rule set, which is only extended by me.


Same here. I think I like to keep the manual categories/keywords separate from the A.I. results. Of course there should be the option to review results and transfer to categories or keywords.
I don't think I would do much off that. I think of the A.I. results as a fuzzy basket with tags which may change over time. As the technology will certainly get better, I think I would update the results at some point or even frequently. That probably depends on how many photos you have, how much it cost and how happy you are with the results.

I would also be curious how it handles "non-stock" type photos..... However, how about messy "normal" folk photos.... family gatherings,
I think most of my pictures fall  in the messy normal folk category. I just recently discovered that Google now adds a folder Things to their albums view. In my case that folder includes 85 albums with different things like birds, boats, dog....
It's almost scary that the pictures shown for these tags/albums are mostly spot on. Even expanding any of these albums, there are not too many obviously wrong entries. My dog would probably disagree as he was identified once as a horse :>).  See attached picture for some of the albums.

As always, Mario is certainly on the right track to integrate the ability to use these tools with Imatch. Can't wait to spend some money on the next paid release :>)

Heiner



Erik

This is all interesting, and as I've grown with my own library and hierarchy, which is tied to my own thesaurus that I fully control, I already fine a few changes to how I work with keywords and the thesaurus that really should make me step back a bit.

For instance, when I first started with my hierarchy, I had probably just 1,000s of photos.  The hierarchy was (and still can be) a convenient way to tag and find photos.  The hierarchy was small, and I could easily navigate through it and find what I wanted.

As the collection grew, so has the hierarchy.  The thesaurus is difficult to manage and even browsing through the @keywords tree isn't as simple as it once was.  Occasionally the same keyword is in different spots in the hierarchy (this was often occurring prior to IM5 but still occasionally occurs), and there are times when the same word is appropriately so.  The best use of the hierarchy is in excluding keywords from the flat keywords or browsing for "broader keywords" like places, trees, mountains, etc, but becomes harder on other detailed items that don't always neatly fall into a hierarchy.

Where I am going with this is that over the pat two years, using keywords has become more about using search and find functions rather than the hierarchy directly.  My thesaurus is so big, that it is easier for me to type in a flat keyword and assign and image (or find images) than actually browse for it in the thesaurus (or @keywords).

This leads me to a spot where using the AI may not actually be that bad for me.  Do I really need the thesaurus?  That's rhetorical, as I do need it, but not really like I thought I did.  If the AI can work as well for finding images in my collection as it will for tagging them, then my IM database can function not much differently than Google does when I search for something. 

Life might get easier.  I've long since found that maintaining my thesaurus isn't as simple as it could be, and I've hesitated to get a controlled vocabulary because I feel I should reorganize all my photos that were previously tagged for conformance. 

Ultimately, I really don't know.  I will look forward to the progress with eyes wide open and eagerness to try.  I applaud the ambition you have, Mario, and even considering such a feature into IM. 

Mario

#34
I know that there are IMatch users which need 30,000 or even 70,000 different keywords. But that's exceptional. I guess that most users can get by perfectly with a few hundred keywords, organized in a few levels.

Having too many keywords complicates things. It reduces performance, makes automatically organizing files by keywords (@Keywords), searching etc. much harder. The trick is to find the right keywords, put them into a controlled vocabulary and then stick to it. That's not always easy, but it pays off in the long end.

The AIs pick keywords from a taxonomy of 300 to 600 keywords! In a flat, single-level hierarchy. For the AIs which provide image classification, the category taxonomy is usually way less than 100 elements.

monochrome

Mario,

I've made a little app for those who'd like to play around with TensorFlow / Inception v3, and I'm in the process of getting it ready for release. Would it be OK to name it "IMception"?

(First I thought about calling it "IMatchception", but that's probably too close, so I reduced it to IMception - much like Oracle won't let you call anything "JavaXyz" while "JXyz" is OK.)

Mario

I have no problem with that, unless you develop a DAM...  ;)

Tensorflow is an interesting technology. I have a project based on it going on myself. Feel free to share what you have.

sinus

Quote from: monochrome on February 08, 2018, 08:35:15 PM
Mario,

I've made a little app for those who'd like to play around with TensorFlow / Inception v3, and I'm in the process of getting it ready for release. Would it be OK to name it "IMception"?

(First I thought about calling it "IMatchception", but that's probably too close, so I reduced it to IMception - much like Oracle won't let you call anything "JavaXyz" while "JXyz" is OK.)

This looks interesting!
Best wishes from Switzerland! :-)
Markus

axel.hennig

Quote from: monochrome on February 08, 2018, 08:35:15 PM
I've made a little app for those who'd like to play around with TensorFlow / Inception v3, and I'm in the process of getting it ready for release.

That sounds really interesting.

monochrome

Quote from: sinus on February 08, 2018, 09:30:30 PM
Quote from: monochrome on February 08, 2018, 08:35:15 PM
I've made a little app for those who'd like to play around with TensorFlow / Inception v3, and I'm in the process of getting it ready for release. Would it be OK to name it "IMception"?

This looks interesting!

See https://monochrome.sutic.nu/2018/02/03/imception.html for demo & downloads.

(Finally!)