Multi-hour processing & almost unuseable PC

Started by jeknepley, April 11, 2014, 04:30:54 PM

Previous topic - Next topic

jeknepley

The top folder in a large hierarchy contained two Excel files, one of which was no longer on line. Try as I might to delete it, it wouldn't go away even when I confirmed I wanted to delete (both in its original state and later in its rejected state; IM5 would mark it as rejected and offered to delete with a confirmation - but it was still there after I confirmed). I decided to re-scan the folder (Shift-F5 without the Ctrl key option). To my dismay that resulted in a complete re-scan of that folder and all sub-folders (hundreds of folders & over 80K files) which I should have anticipated I guess. That has basically locked up any processing with IM5. It definitely is processing as shown by my NAS I/O monitor, but the IM5 window itself is all white with a thin border (as it is on start up) with the standard windows hourglass showing. If I click, the Not Responding window appears. Task Manager shows 10 active exiftool.exe processes running which collectively are averaging 50% of the CPU; add to that IM5 itself using another 15-40% (varies) and things are pretty much unusable (not just IM5). I find this general state of affairs to be true almost every time that a large number of files are being ingested or scanned. It may just be my setup, but if it's not that uncommon it raises the question of user frustration for any but small DB's.

After nearly 30 minutes, I'm still looking at the solid white IM5 display. Chose the Not Responding Close Program option (I've had to do this many time each day for the past two weeks while I familiarize myself with IM5). All of the exiftool files (& IM5) closed. Restarted IM5 & chose to use the last used DB. After restart all is the same as when I closed (all if the exiftools & CPU use as before) except now I can see the IM5 window (no longer all white). As a side note, the I&A panel bug we discussed earlier today is still there - the panel was blank after the restart but IM5 allowed me to "fix" it.

I just noticed, when I accidentally left the Preference window open for several minutes that all processing appeared to stop during this time - my NAS resource monitor went from multi-MB per sec. I/O to zero. IM5 was reading metadata after the above forced folder re scan. This isn't my main concern - just curious as to whether it's intended and, if so, what other similar tasks do the same.

I guess this may be 0, 1, 2, 3, or 4 bugs (IM5 screen all white window; unable to delete xlsx file; PC CPU overload; Pref. window open stops processing; none of the above) depending on which problems are (or aren't) unique to my system. Since I wasn't sure, I decided to start here in General Discussion. The one ray of sunshine is that at this point in the re-scan (almost still 10 hours remaining to Read MD) the file I wanted to delete - thus starting this all day wait - appears to be gone.  :-\

[attachment deleted by admin]

Mario

1. off-line files cannot be deleted (but rejected). They are removed from the database automatically when you rescan the containing folder. There was a bug report about this and the next build of IMatch shows an extra message informing you about this. See the "solved bugs" board for details.

2. Scanning a folder hierarchy of 80K files should not take that long. I just did that on my 100,000 files sample database (local disk) and it was performed in about  minute (tool as long as writing this reply). You did not include a log file (see How to report bugs) so I cannot say anything definite.

My guess that IMatch is dramatically handicapped by your NAS box. NAS means a) file access over network, b) SAMBA running on Linux, c) SAMBA simulating a Windows file system. Most cheaper NAS boxes are targeted at streaming files (videos, images) and not for typical server operations like reading directory entries of 80,000 files and who knows how many folders.

3. When you rescan a folder, IMatch puts all folders into the background processing queue, which then checks each folder for new, updated and no longer existing files. The progress is displayed in the Info & Activity panel. IMatch does not block in this phase. I think IMatch is still counting the folders on your NAS or something. A task that is usually fast and does not require threading or progress dialogs. But it takes so long in your environment that you experience the effect. I cannot say without a log file.

QuoteI just noticed, when I accidentally left the Preference window open for several minutes that all processing appeared

This is intentional. All background processing is paused while the Preference dialog is open, and resumed when you close it. You could make changes which affect how the background processing has to be performed, among other things.

IMatch also pauses background processing, or lowers the resource usage, when the user interface is active (aka the user doing things), during several resource-intensive tasks (Batch Processing, calculation of categories and collections, etc.). IMatch monitors the overall system usage and controls the background processing based on that.

I would like to see the log file because it tells me what IMatch is doing, which operations take that long etc.

jeknepley

Sorry about the log file. Suspected that you'd want it, but with the state of my PC  (IM5 unusable with all white screen) I couldn't get at it. I should have realized once I restarted IM5 it was there.

Here it is. Never have sent a log file before - I'm sure you'll tell me if I didn't do it right. OOps, I can see a problem - size is 3231 KB. What is plan b? :P

jeknepley

Just sent the log file to your support email. Hope that gets it to you.
Ed

Mario

I receive about 20 log files each day via email or via the FTP server. Have you included the topic ID (see address bar above) or a link to this topic? Your email may not tell me anything.

jeknepley

Just sent the log file again.

Subject line has the topic name; body contains link to this thread; attachment box contains a 3.2MB txt log file.

Hopefully that's satisfactory.

Mario

IMatch took about 90 second to load your 4 GB database with 170,000 files. That's OK for a cold load.
Did you run a database compact and optimize (Database menu > Tools). This may speed up things considerably.

The performance on the NAS is quite bad. I cannot tell exactly because it seems that you did not enable DEBUG logging as requested in the Beta Tester Guide (in the help) so IMatch logs only minimal data. Go to Help > Support to toggle the logging mode to Debug.

Your log file is totally full of ExifTool warnings, like:

Warning: Invalid date/time (use YYYY:mm:dd HH:MM:SS[.ss][+/-HH:MM|Z]) in XMP-photoshop:DateCreated (PrintConvInv) - \\DISKSTATION\Photo Share\IMatch Images\DVD Copies\FILENAME.jpg

or

Warning: Bad PreviewIFD directory

Which indicates some severe problems with the files IMatch is processing.

Are you writing data back at the moment. This looks like, but the log is not in debug mode and incomplete (was taken while IMatch is still running).

If you have enabled background write-back under Edit > Preferences > Background Processing IMatch will write back data automatically when needed, e.g. when it produced new XMP data during ingesting files. Your log file looks like IMatch is trying to

a) Write back metadata for 170,000 files
b) Is badly handicapped by the slow NAS response times
c) ExifTool has to deal with many invalid files or invalid metadata.

Suggestions:

1. Turn on debug logging so I see what's happening
2. Check the background processing settings
3. You may clear the write-back queue under Database > Database Tools. IMatch then stops writing back data and considers all files as "current".
4. Check the metadata of one of the files reported as having bad date and time info. Such errors usually only happen when the user enters an invalid data and/or time and ExifTool returns a warning. I don't know how invalid dates can show up in your files.

Since your NAS is performing slow (which is not uncommon for NAS), follow the tips given in the help for NAS drives.
You may also want to limit the number of threads IMatch is using for processing, to reduce the load on the NAS. Maybe it performs better. Go to Edit > Preferences > Application: Process Control. Set both values to 2 to make IMatch use only two threads. This will lower the load on your NAS box and maybe this improves response times.








jeknepley

This is an interim reply until I follow through on your suggestions and have a chance to compare.

I wanted to mention that since I thought that whatever it is that's behind this slow performance extends beyond the NAS, last night I ran a small test.

1. I selected a relatively small data sample to add as a new folder to IM5 - 6 sub-folders and a total of 994 image files (2.9GB total size).
2. I placed one copy in the suspect NAS. Another I placed in the direct connected external hard drive that I use with IM3.
3. I dragged and dropped the folder from each location (not both at the same time; when IM5 queue were zero & the system was idle - and no other programs were running)
4. I monitored the I&A panel and recorded the time to finish reading MD and then to add the files.

In this test two difference DB's were used. The NAS folder was added to the DB we've been discussing here - well over 100K files. The HD folder was added to a new DB that, before the addition of the test folder, held less than 1,000 files.

As I suspected, the NAS wasn't terrible when compared to the HD. It took twice as long. The actual times were on the order of: HD ~ 6 minutes (2.5 for MD & 3.25 for the 2nd step) and twice that for the NAS. I'm reasonably satisfied with the NAS's performance in comparison to the HD, but feel that the base time  of 6 min. (HD) may be too long. It seems that my performance problem's chief source may lie elsewhere and not with the NAS (which I believe not to be the garden video|music streaming device as shown in these specs.

DS414
High Performance 4-bay NAS Server for SMB & SOHO

Synology® DiskStation DS414 is a feature-rich 4-bay NAS specifically designed for growing businesses to effectively manage, protect and share data. Equipped with comprehensive office applications, DS414 raises working efficiency while securing data with full backup solutions. - I use 4 quality 3-TB drives & RAID 0
Dual Core CPU with Floating-Point Unit
Over 207.07MB/s Reading, 135.63MB/s Writing1
Dual LAN with Failover and Link Aggregation Support
1GB RAM Boosting Multitasking Power
Features SuperSpeed USB3.0
Hot-swappable & Tool-Less Hard Drive Tray Design

Mario

Synology is one of the good vendors, no doubt.
But writing back data to 170,000 files will bring it down to a crawl.
Performance when ingesting is usually much faster than writing-back. Your log file was incomplete and so I can only guess that IMatch is writing back. And I have no performance info seen so far, just about 2 MB of ExifTool warnings and messages encountered while reading/writing your files. Which is rather uncommon and may be something you have too look into.


jeknepley

.... rather uncommon and may be something you have too look into.

I'd be glad to do that. Can you point me toward a starting point on what I should be looking for? Apart from knowing what exif is & what, in general, it contains I'm a novice - especially when it would come to tracking down exif anomalies.

A starting point - could just be as simple as a suggested Google search string or exif for Dummies - is all that I need. I prefer to do the digging on my own, but knowing where to start the hole - well that's something else.

jch2103

Quote from: jeknepley on April 12, 2014, 01:54:48 PM
.... rather uncommon and may be something you have too look into.

I'd be glad to do that. Can you point me toward a starting point on what I should be looking for? Apart from knowing what exif is & what, in general, it contains I'm a novice - especially when it would come to tracking down exif anomalies.

A starting point - could just be as simple as a suggested Google search string or exif for Dummies - is all that I need. I prefer to do the digging on my own, but knowing where to start the hole - well that's something else.

Anyone with a large collection of images collected over some period of time probably has potential issues with metadata corruption. These issues have usually remained hidden and may have been caused by software used years ago (even software from well-known vendors). As you've found, this can affect import into IMatch, among other things.

Search this forum for topics like 'corruption'. The issue's been discussed a number of times, but postings are scattered here and there. The ExifTool forum, especially the FAQ and in particular FAQ #20, has more information on this issue, e.g.: http://www.sno.phy.queensu.ca/~phil/exiftool/faq.html#Q20
Unfortunately, I don't think there's a quick or simple answer to cleaning up the metadata corruption problem, as some of the fixes require using tools like ExifTool to re-write the metadata.

This whole topic probably deserves its own spot in this Forum, but that would require someone pulling it together. Hopefully, cleaning up metadata will be a one-time issue for most users, as IMatch and ExifTool are pretty reliable these days.
John

jeknepley

Thank you very much for that. Exactly what I was looking for. I don't doubt that there are problems lurking in my data accumulated over the past dozen years or so - many files probably touched by a dozen different programs along the way. Assuming that I learn something in my digging, I'll post a suitable thread documenting it. Now where did I put my shovel?

Mario

#12
The log file you have sent to me is basically 2 megabytes of ExifTool error messages. To much to make sense of it.
When you look at the log file yourself (search for lines containing

Invalid date/time (use YYYY:mm:dd HH:MM:SS[.ss][+/-HH:MM|Z]) in XMP-photoshop:DateCreated

to find the corresponding rows) you will also see which file names are reported. You so far did not reply to my questions (e.g. if IMatch is writing back data, see above) so I cannot help you more. The log file is not useful and you have provide one created as I outlined above. And give me the info I requested.

If IMatch is writing data the source of the error is totally different from when IMatch is reading data. I don't even know this yet, unless you give me more info to work with.

Tip:

1. Make sure you enable debug logging
2. Create a small DB (170,000 files is useless for tracking down problems).
3. Add a folder of files for which ET reports the errors in your 2 MB log file, you can find the file names as described above.
4. Make sure you do not enable background write back under Edit > Preferences > Backtround Processing

This will produce a log file which will actually be useful and will tell us if ExifTool had problems reading the metadata from your files.
If this is OK, we write back some data to see if the errors show up in the log file.

jeknepley

Sorry for taking long - like you, lots going on and at age 77 not as fast as I used to be. ;)

I've done the search and identified good candidates for a smaller DB. Will get the requested data to you within 24 hours.

Mario

No need to hurry. But a smaller log file will help isolating if and which files are problematic, and if the problem happens while IMatch is reading data from the files, or when IMatch is writing data.

By default (check: Edit > Preferences > Background Processing) IMatch does not write back data to files unless you explicitly tell it to.

IMatch may "produce" new data when it imports files, e.g. when it fills in XMP metadata from existing IPTC, EXIF and GPS data in your images. The XMP record is then new or updated and IMatch will mark the file as having pending write-backs (yellow pen icon in the File Window). Unless you click the pen or do a Commands menu > Metadata Write-back) the modified data will be kept in the database only and the files are not touched.

But, if you have enabled background write-back, IMatch will read data from the file, and then immediately write-back data to the file to flush out the newly generated metadata, and then read in the metadata back again to get all the info ExifTool has produced in addition to the written data (e.g. IPTC digest info and suchlike). This will cause a big performance penalty when you ingest 100,000+ files because each file has to be read twice and written once. Quite a lot of stress for your NAS - or any system.

Usually this works fine, albeit of course much slower than just reading data. Background write-back is of by default for good reasons.

But since ExifTool complains about invalid dates, invalid metadata structures and the like, my guess is that you have configured IMatch to write-back data in the background and this is one while you ingest your files. Meaning: slow performance and also a reason for the massive amount of ExifTool warnings and error messages.


jeknepley

I've created the log file with debug enabled. This file is what resulted from enabling debug logging and at the end using the help>support>copy option. Correct? Here's the problem -it's 21MB. How would you like that sent? I could add it to my DropBox storage.

Here's what the log file documents-

  • I created a new data base and dragged one of the smaller folders whose images were generating exiftool warnings (Warning: Bad Preview) in the previous file I sent (770 files, 1.85GB). Hopefully this time the file has useful info, but not so much as to be overwhelming.

I copied the log (debug) after the files were completely ingested with a queue of zero and the DB idle. It took about 3 minutes to read metadata and about 9 minutes to add & update files. The PC was otherwise basically idle during this time. Time to ingest only 770 files seems excessive; if so hopefully the reason is in the log file.

Attached is a screen shot of the IM5 window. Note the preferences window shows no auto write-back. This is how it's always been set; I use manual. Also, cache is set to on-demand.

If this still isn't right, let me know and I'll have at it again.

[attachment deleted by admin]

jeknepley

Quote from: Mario on April 13, 2014, 09:58:51 AM
No need to hurry. But a smaller log file will help isolating if and which files are problematic, and if the problem happens while IMatch is reading data from the files, or when IMatch is writing data.

By default (check: Edit > Preferences > Background Processing) IMatch does not write back data to files unless you explicitly tell it to.

IMatch may "produce" new data when it imports files, e.g. when it fills in XMP metadata from existing IPTC, EXIF and GPS data in your images. The XMP record is then new or updated and IMatch will mark the file as having pending write-backs (yellow pen icon in the File Window). Unless you click the pen or do a Commands menu > Metadata Write-back) the modified data will be kept in the database only and the files are not touched.

But, if you have enabled background write-back, IMatch will read data from the file, and then immediately write-back data to the file to flush out the newly generated metadata, and then read in the metadata back again to get all the info ExifTool has produced in addition to the written data (e.g. IPTC digest info and suchlike). This will cause a big performance penalty when you ingest 100,000+ files because each file has to be read twice and written once. Quite a lot of stress for your NAS - or any system.

Usually this works fine, albeit of course much slower than just reading data. Background write-back is of by default for good reasons.

But since ExifTool complains about invalid dates, invalid metadata structures and the like, my guess is that you have configured IMatch to write-back data in the background and this is one while you ingest your files. Meaning: slow performance and also a reason for the massive amount of ExifTool warnings and error messages.

I replied to this yesterday and, as I said there, need a way to send the needed log file. I suggested my DropBox account but need info from you first.

herman

Quote from: jeknepley on April 13, 2014, 03:07:50 PM
I've created the log file with debug enabled. This file is what resulted from enabling debug logging and at the end using the help>support>copy option. Correct? Here's the problem -it's 21MB.

Well, I am not Mario, but maybe I have some useful suggestion.
Is the 21 MB the size of the logfile as produced by IMatch?
If so, that is a plain text file which compresses well.
Just right click on the log file, select Send to > compressed folder.
It will end up as a folder which is much smaller than the original file.
If that file is not too big to be attached to a post you can just attach it to a post here on the forum.
When it exceeds the maximum attachment size you can send it via e-mail to Mario's support mail address.

Hope this helps,
Enjoy!

Herman.

jeknepley


jeknepley

Here's the requested log file. If it's not right (or helpful) for whatever reason, let me know.

[attachment deleted by admin]

Mario

Thanks for the log file.

I see the following performance-related data:

IMatch processes your files in batches of 20 files.
ExifTool requires about 10 seconds to process a batch, with about 200 tags loaded from each file.
IMatch needs about 150 milliseconds to process the ~ 4000 tags loaded from each set of 20 files.
This time will get slowly slower as the database fills up, but an Compact/Optimize restores it to the initial performance (almost).

There are no errors in this log, and only warnings about some damaged/incompatible IFD segment in the EXIF record, reported by ExifTool:

Warning: Bad PreviewIFD directory

To extract the thumbnail/cache image, to calculate the CRC checksum, the visual query data etc. IMatch needs between 1.0 and 1.3 seconds per file.
The checksum-extraction requires to read the entire file and takes about 0.4 seconds. Loading the thumbnail takes about 0.6 to 0.8 seconds.

Test:

On my system (local older hard disk) loading thumbnails of 4000 x 6000 pixel JPEG files takes about 0.2 to 0.4 seconds.
CRC calculation is done in about 0.05s (50 ms).

It looks to me that the NAS box is the limiting factor because all the I/O-intensive processes (ExifTool, extracting the thumbnail and CRC calculation) are comparatively slow. The log file is not complete so I don't see the max/avg stats IMatch produces when it shuts down. This info tells me more, e.g. the average load times for files.

Memory load is low. IMatch used about 250 MB (!) maximum.
I see no warnings about exceeded CPU limits (IMatch logs when the CPU load is to high for longer times).

Looks like a pretty smooth run and IMatch should perform very well when the initial load from the NAS is completed.

There are no warnings about date and time problems and IMatch has only read data (no immediate write-back) which is th default behavior. Quite different from your initial log file.

jeknepley

Thank you very much for that analysis. I apologize for taking up so much of your time getting to this point.

The good news is that I think I understand the major issues now. After running a series of tests using various combinations of PC's, DB locations, image files locations - all the while performing the identical tasks - things  have become clearer. Further, I now have a better handle on creating and sending the data you need. I guess that I did miss one thing with that last log file though in that I copied it before closing the DB. I hope there's not a next time, but if so I'll try to remember.

To be clear - I am satisfied with speed & performance once the files have been ingested. It's getting them into the system that's trying. Fortunately that's usually a one-time thing.

Thanks again.

Mario

QuoteFortunately that's usually a one-time thing.
Precisely.