Cloud autotagging sometimes skips files

Started by sybersitizen, July 19, 2023, 09:15:55 PM

Previous topic - Next topic

sybersitizen

I've tested a few of the cloud autotagging services and chose Microsoft Azure to tag my tens of thousands of photos ... but I soon noticed that the process sometimes inexplicably 'skips' files. When that happens I keep the same set of files selected and run the Auto Tagger a second, or maybe even a third, time until all files are tagged. (My settings specify that any files that already have keywords won't be re-tagged.)

Here's a link to some example screen shots:

https://photos.app.goo.gl/mZCJWXsVrmLavANQ6

It looks like IMatch is sending requests for all the files, and Azure is supplying keywords for all of them, but sometimes they are not all applied.

The way I handle this is to autotag only one folder at a time and check the results instead of selecting multiple folders and trusting that all the tags will get applied. It's a workaround, but not ideal.

Has anyone else encountered this? Is it possible that it only happens with Azure?


Mario

Please always include a log file (if possible in debug  mode) of IMatch sessions where you encounter problems.
See log file

A similar issue was never reported before.
Please give us the settings you are using in AutoTagger, lookups, maps etc.

If you process thousands of files in a row, Azure may impose a threshold, although I believe AutoTagger respects the maximum number of requests allowed by Azure per account per second.

We need a lot more details, starting with the log file of a session where you've experienced this problem.
A very small number of users uses Azure due to privacy issues, but nobody ever reported something similar.

sybersitizen

Okay, I'll enable Debug Logging tomorrow when I process another set of folders, and send you the file. I guess that's best done through email.

I don't know about any threshold for file numbers with Azure, but that's not the issue I'm having. I've successfully submitted more than 500 files at a time without any being skipped, yet the skipping can occur with smaller numbers, as low as 100-200 files. It's rather unpredictable.

I'll also mention that in very rare cases, Azure fails to supply any keywords for an image no matter how many attempts are made. That's presumably because the algorithms can't figure out what the image is.

Tveloso

It may not be related at all, but I thought I should mention it anyway, just in case: this sounds similar to some files seemingly having been skipped during Face Detection/Recognition.

Sometimes, after having run Face Detection, I encounter files that have no Face Annotations, to which the Viewer can correctly add the missing faces (via F6), so it's as though the "batch Face Detection" operation in the File Window skipped these files.

I wonder if it could possibly be that the same "outer driver process" is involved in both Face Detection and AotuTagging, and something there causes the results of that processing (either Keywords or Faces) to not get applied to some files...(so both processes are working correctly, but the files are "skipped" for another reason)?

I've never tried re-running Face Detection (apart from in the Viewer) on files that were initially "skipped", as sybersitizen has done here for AutoTagger.  It would be interesting to do that...

It might be a good idea to get a Debug Log showing an AutoTagger session which did not Keyword some files, then filter for (or better still, create a result window from) these files, then run AutoTagger again on them, in that new scope, and make note of the files that are now tagged on the second go-round (and review tho Log to see what happened with them on the first go-round, where they were not tagged)?...

I'm going to try doing that with Face Detection later today or tomorrow, and will report back.
--Tony

Mario

#4
QuoteSometimes, after having run Face Detection, I encounter files that have no Face Annotations, to which the Viewer can correctly add the missing faces (via F6), so it's as though the "batch Face Detection" operation in the File Window skipped these files.
This can happen if the face in the image is quite small and you use the default setting for face recognition.

If you later load the image into the Viewer, it is loaded at full cache size and the face recognition also runs at that size, which allows for detecting faces which are too small to detect in the normal process (unless you use the Optimize for small faces). See Tips for more information.

sybersitizen

#5
I ran an autotagging session today in which 371 files were selected. IMatch reported that only 350 were tagged. On the second run, the other 21 were tagged.

The zipped file is 1.2MB. As I mentioned, I'm using the paid Azure service and am asking for a maximum of 15 tags per image.  What other info do you want me to include in my email?

Mario

A link to this topic. I get many emails per day.

it sometimes happens that Azure does not return data for a file, but when the service is called the next time, it does.
I've experienced this a number of times. But there is a almost zero number of users who use Azure (most use Google or now the IMatch AI) and hence there is not much experience.

If Azure returns an error code, AutoTagger logs it.
If Azure just returns an empty result for a file, AutoTagger accepts this as it is - since this can happen if Azure does not find keywords for a file.

IMatch limits the number of requests per minute for Azure paid tier to 600, with a batch size of 1, which is within the current limits I could find. If Azure considers a limit as exceeded, it will return errors and AutoTagger will log them.

Do you see any lines with W> or E> in the log file?

sybersitizen

#7
Quote from: Mario on July 20, 2023, 08:40:18 PMA link to this topic. I get many emails per day.
Done.

QuoteBut there is a almost zero number of users who use Azure (most use Google or now the IMatch AI) and hence there is not much experience.
In addition to trying IMatch AI, I also set up accounts with Google and Clarifai. I tested them all with a representative selection of my files and found that Azure usually gave me the most useful results. It was only after starting my bulk autotagging project that I noticed the occasional skipping.

QuoteDo you see any lines with W> or E> in the log file?
Just one, but not related to the issue:

07.20 10:06:59+    0 [0A00] 01  W> Spelling: Cannot find a dictionary for language 'en' or failed to load.

Mario

This just tells you that you have not yet installed a spell checker dictionary and hence IMatch cannot do any spell checking in the MD Panel, KW Panel or Attributes panel. See Download Dictionaries for Common Languages

Mario

Thanks for sending the log file.
The log is clean, no warnings (except for the dictionary), no errors, nothing. AutoTagger reports that it is processing batches of files and data is sent to IMWS, thesaurus lookups are made etc. All pretty normal.

If Azure would return an error message, it would be logged.

There is no more detailed logging in place, because it was never needed. If the external service does not return any keywords for a file (but does not return an error code either), AutoTagger does 'nothing' to the file. This is not considered an error or something that needs to be reported.

Maybe I'll add some more logging to a future AutoTagger version to log this condition.

Mario

I've checked the code and every error condition is taken care for in form of notifying the user and logging error codes and AI service results to the IMatch log file.

For the next release, I've made two related changes to AutoTagger:

1. If Debug logging is enabled, it logs the number of keyword assigned to each file id.
2. If one or more files did no receive keywords from the AI service, AutoTagger now shows their number in the results and allows to open them in a Result Window for review.

sybersitizen

Just providing an update.

I'm getting nearer to completing the autotagging of my collection. Yesterday I added 5,492 files to the database, distributed in various folders and subfolders. This time I selected the entire group and had Azure autotag them all instead of doing one folder at a time. Not a single file was skipped.

Surprising, but I'm glad it worked. I'll do it that way with the remainder of the files and keep my fingers crossed.

sybersitizen

Your comments about error messages reminded me to mention something else. It's not exactly related to the occasional skipping issue, but it is important to know: Azure has in fact returned error messages having to do with image pixel dimensions. I discovered that if a file is sent that's less than 50 pixels in one of its dimensions, an Azure error is reported and the whole autotagging process is aborted. I've only had two such files so far in my collection, but that was enough to prompt to me request a way to identify any more that might be there, which you provided in this thread:

https://www.photools.com/community/index.php/topic,13430.msg94669.html#msg94669

Shortly afterwards, I found that Azure will also throw the same fatal error if the pixel dimensions are too large. That happened with some of my panoramas, so I built another category to identify files like that as well.

In either case, IMatch displays a window over the Auto Tagger app with this message:

Snap!
An error occurred while communicating with the external service. See below for details.
{ "readyState": 4_ "responseText": "{\"code\":\"InvalidImageSize\"_\"requestId\":\"107ca375-e733-4b30-8745-2f0f8860b7c2\"_\"message\":\"Image must be at least 50 pixels in width and height\"}"_ "responseJSON": { "code": "InvalidImageSize"_ "requestId": "107ca375-e733-4b30-8745-2f0f8860b7c2"_ "message": "Image must be at least 50 pixels in width and height" }_ "status": 400_ "statusText": "error" }


It's the same error message (except for the requestId strings) even when the dimensions are too large. I don't know what that exact limit is, but a file with only 8,000 horizontal pixels can trigger it if the width to height ratio is too great. 8,000 x 2,000 pixels causes no problem, but 8,000 x 1,000 pixels generates the error and the process terminates.

Again, I have my solutions in place, but I figured I should keep you informed!

Mario

IMatch uses the thumbnails it maintains for each file in the database.
IMatch does not send the full-res image or cache image.

These are by default 300 pixels (largest edge) - if the base image is larger than 300 pixels. Else the thumbnail matches the size of the image. If you manage icons or other small files in the database - these should not be used for auto tagging.

Your panorama should hence have created an image with a width of 300 pixels and a matching height.
Do you use non-standard thumbnail settings (Edit > Preferences > Database?)

Tveloso

Quote from: Mario on July 26, 2023, 09:53:48 AMYour panorama should hence have created an image with a width of 300 pixels and a matching height.
This explains why the error message given is always referring to the minimum size.  At 300 pixels on the long edge, the panoramas will likely dip below 50 pixels on the short edge.  Likewise, the 8000x1000 image (300x38) falls below that Azure minimum, while the 8000x2000 image (300x75) does not.

Incidentally, my theory about this "skipping" behavior possibly being related to a perceived skipping during Face Recognition seems to be all wet. 

I tried isolating the files that did not get faces, and then running Face Detection again, but no new files  in that group were given faces.  I did this only once so far (and the set of files that did not get faces were probably a poor example of the condition - where files that should have gotten faces did not).

I actually posted about that issue (and a perceived pattern in the "skipping") in another topic, so if I find anything new on this, I'll post there, so as not to muddy this topic (which is about AutoTagging with Asure).
__PRESENT
--Tony

Mario


QuoteI tried isolating the files that did not get faces, and then running Face Detection again, but no new files  in that group were given faces.  I did this only once so far (and the set of files that did not get faces were probably a poor example of the condition - where files that should have gotten faces did not).
The faces are probably too small to detect in your case.
Open the files in the Viewer, zoom in, run the face detection again.
See the related comments and suggestions in the People Help: No Faces Found?

If your specific aspect ratio produces files to small to process, consider using a larger thumbnail size. Files with 8,000 x 1,000 pixels are not very common and will require some user attention.
You can increase change the thumbnail size under Edit menu > Preferences > Database, then select the outlier images, use Shift+Ctrl+F5 > Force update to create larger thumbnails for them (write-back before if needed).
Then switch the thumbnail size back to 300 pixels or whatever you consider sufficient for your screen size and resolution.

sybersitizen

#16
Quote from: Mario on July 26, 2023, 09:53:48 AMIMatch uses the thumbnails it maintains for each file in the database.
IMatch does not send the full-res image or cache image.

Yes, I'm aware ... and those two tiny files are just unimportant anomalies that happened to be in my huge mass of files. There should be very few if any others, but I'll still run my pre-check until I get the collection done.

I'm using a 350 pixel size for thumbnails rather than the default 300 - not a big difference.

You have identified what's happening with my panoramas. When those thumbnails get generated, any exceptionally long ones end up with a height of less than 50 pixels. I should have realized that, but it makes perfect sense now.  I don't have a large number of those, so it's no big deal to handle them separately.

Do cloud services other than Azure have similar limitations to watch out for? Just curious.

Mario

Quote from: sybersitizen on July 26, 2023, 04:56:06 PMDo cloud services other than Azure have similar limitations to watch out for? Just curious.
The documentation of the various vendors is a bit fuzzy in this regard, and may be vary over time and different API versions.
In my experiments, a 300 longest edge thumbnail was always sufficient, sending larger images did not yield other or more keywords, just increased network traffic and runtime.
50 or 100 pixels is way to small for all services. Some may flag it as an error, some may just deliver no or wrong keywords.

If you have many images with an aspect ratio of 1:8 or similar, even thumbnails with 800 pixels will barely yield 100 pixel width/height. But such extreme aspect ratios are quite rare, or at least rare when it comes to face recognition or auto tagging.
As I said, there is a work-around to produce much larger thumbnails for selected images.

So far I see no need to extend the AutoTagger to use larger images than the standard thumbnails. This might change in the future, depending on how the vendor APIs and requirements change. IMWS has the capabilities, and adding a "min edge length" feature into AutoTagger would not be hard to do.

Mario

I've decided to refactor AutoTagger to support a minimum image size.
See release note #1986 for more information.

sybersitizen

Quote from: Mario on July 26, 2023, 07:20:50 PMI've decided to refactor AutoTagger to support a minimum image size.
See release note #1986 for more information.

That's excellent, along with the other Auto Tagger changes!