AGAIN - unnecessary rescan of all files and folders

Started by Carlo Didier, February 28, 2022, 12:53:51 PM

Previous topic - Next topic

Carlo Didier

iMatch is doing it again. I had to change my damaged dock so now the SSD with part of my images is on another port, but has the same ID and drive letter (!)
And yet, although only two folders have had changes, iMatch rescans ALL files on that disk for no reason at all!
Attached a screenshot what it says. It's still running, but when it finishes I'll attach the log file (fortunately, since this happened last time, I have been running iMatch with debug logging on all the time).

Mario

Then something has changed. When Windows sends a "folder changed" message to IMatch, it must rescan the folder to find new and updated files.
This is a very fast operation, because IMatch just compares the older of the "last modified/created" timestamps with the corresponding database timestamp.
IMatch can check tens of thousands of files per minute that way.
Only when the timestamp reported by the file system is newer than the timestamp in the database, IMatch enqueues the file for processing.

You can see this in the log file. Search for AddOrUpdateFile. IMatch logs "file is current" if the timestamps match. Else it processes the file.
Why the file system timestamps could change when you plug in your SSD in another port, I have no idea.
Changing the drive or media serial number would flag the folders as off-line, not rescan them.

Carlo Didier

NO. As you can see in the attached screenshot, only the last folder from 2022 has a recent last changed date. The database has been last used several days ago. There is absolutely no reason whatsoever for iMatch to rescan all those folders. NONE!
I'll let it finish the rescan (the poor laptop is blasting his cooler fan at maximum since several hours) and send you the debug log. There must be a bug causing this.

Mario

#3
QuoteThere is absolutely no reason whatsoever for iMatch to rescan all those folders. NONE!

When you open a database, IMatch checks each folder for it's on-line status and also compares the timestamp of the folder with the timestamp in the file system.
If these differ, IMatch enqueues the folder and checks it for new or missing files, updated files etc.
This is fully automatic and runs in the background.
When new files or files with modified timestamps are detected, IMatch will enqueue them for processing.

I cannot tell from some file system timestamps what the issue is on your computer or with your SSD.
This usually "just works" and when a user reported an unintended scan in the past, the problem was usually that the file system timestamp. Not IMatch.

I work with a dozen IMatch databases every day. Some of them have been created with IMatch 3 and still exist.
I index files from local disks, USB drives, NAS systems and servers.
I run IMatch on hardware and in virtual environments, local and in the cloud.
I don't recall IMatch ever reporting files as being changed when they have not been changed (aka timestamp in the file system has not changed but IMatch for some reason things it has).
Plugging in a disk in another USB port does not cause any issues either AFAIK.
I regularly work with RAW processing software, image, video and audio editors which modify files indexed by my personal IMatch database.
IMatch always notices when and which files where changed and never rescans folders without any changes.

If this would be a general issue, we would see a ton of reports here in the community.
You need to figure out which of the applications you run or which step in your workflow causes this. This may help to figure out why IMatch may think that all file timestamps in your database are suddenly older than the timestamps reported by the file system. I cannot t this remotely.

Where are your cache images stored?
Do you use automatic cache purging (Edit > Preferences > Cache)?
Which action in IMatch triggered this unconditional rescan?

Have you ever used the special tool to clear the processing queue?
I once had a user who did that, stopping IMatch from indexing files noticed as changed. This caused the file stamps in the database to remain unmodified, and thus every time a folder was rescanned, IMatch of course found the same files with a newer file system timestamp and added them for indexing. The user had long forgotten about him clearing the processing queue...

ubacher

I had/have the same issues with unexplained rescans. It would help if in the Imatch log it would
list the different timestamps as this would maybe give me a hint who/when the file changed.

Carlo Didier

Aaaah, never thought the cache could have something to do with it. I don't think it's the size, because in the log file it says "Cache image outdated or missing" for 37200 files ... while the cache now, after the updating, has 28GB with 50GB set as maximum.

This is annoying, as there is no apparent reason when iMatch rescans because of outdated cache images. I'll put a feature request for iMatch to issue a warning before it does this. So the user knows what will happen and why, and he can maybe switch off the purge option before iMatch purges and has to rescan tens of thousands of images.

Carlo Didier

#6
Also, during this rescan, iMatch seems to have generated again a lot of tmp files. This has also happened before. Nothing else was running.
It must be iMatch generating them, because when I rename to .jpg, they are the images that are rescanned ...

Mario

When I recall correctly if an image is rescanned, and the cache image is missing or outdated (older than the file), the cache image is regenerated, unless cache images are set to on-demand.

IMatch does not create temporary files with the .tmp file extension.
The only component which creates a .tmp file is IMWS. It create one file, in the cache folder, to check it is writable. And then deletes the file again.

IMatch creates temporary files in the TEMP folder while processing video and PDF files.
For video frames, the temporary files use the .jpg extension (this is controlled by the 3rd party tool), and for PDF files (and for many other things) IMatch creates temporary files starting with imt_ (which stands for IMatch Temp).

The Cache Manager may also create temporary files, with the .jpg extension, when the user has disabled caching.

I cannot find any place in the IMatch source code where files with .tmp are created (except one file in IMWS).
The 3rd party web server component used by IMWS may also produce short-lived .tmp files. But these should be deleted when IMatch closes.

thrinn

We had these mysterious pliXXXX.tmp files a while ago (see this post) and never were able to identify what program created them.
Thorsten
Win 10 / 64, IMatch 2018, IMA

Mario

Quote from: thrinn on February 28, 2022, 05:18:12 PM
We had these mysterious pliXXXX.tmp files a while ago (see this post) and never were able to identify what program created them.

I remember that. But there is no trace of anything in IMatch that creates files starting with pli

IMatch cleans up temporary files it creates after use. These files are short-lived anyway.
It may left some temporary files in the TEMP folder when it crashes. But that's very rare.

These temp files may be created by something outside the control of IMatch, by a WIC codec, Windows shell thumbnail handler, LibRaw perhaps?
No idea, really.

Carlo Didier

Quote from: Mario on February 28, 2022, 06:04:17 PM
Quote from: thrinn on February 28, 2022, 05:18:12 PM
We had these mysterious pliXXXX.tmp files a while ago (see this post) and never were able to identify what program created them.

I remember that. But there is no trace of anything in IMatch that creates files starting with pli

IMatch cleans up temporary files it creates after use. These files are short-lived anyway.
It may left some temporary files in the TEMP folder when it crashes. But that's very rare.

These temp files may be created by something outside the control of IMatch, by a WIC codec, Windows shell thumbnail handler, LibRaw perhaps?
No idea, really.
And yet, it's pretty sure those are temporary JPGs created during the rescan, resp. the re-creation of the cache images. They were all created while iMatch was re-creating the cache images, the file creation timestamps confirm this. And any file I rename to .jpg is effectively a JPG file from one of those images. They occupy 48GB on the disk (more than the actual iMatch cache ...).
Maybe a third party lib you use to create the cache images ran into some race condition, due to the huge number of files?

Mario

Which file type where you processing?
RAW files? Do you use WIC codecs or LibRaw?
Videos?
PDF files?
Office Documents? Affinity Files? ...



Carlo Didier

Only images and videos: RAF, DNG (converted from various cameras; NEF, ARW, ...), TIF, JPG, MOV, MP4, ...
iMatch should be using the WIC codecs. System is Windows 11 on an HP Spectre 360 with Intel Evo i7 CPU.

digedag

Quote from: thrinn on February 28, 2022, 05:18:12 PM
We had these mysterious pliXXXX.tmp files a while ago (see this post) and never were able to identify what program created them.

Same here.

I remember this because at that time I also found these mysterious files - but always only with the size "0 KB".
And they keep appearing again and again.

Quote from: Mario on November 01, 2020, 08:42:52 AM
But I clean the TEMP folder occasionally.
.. me too

Attached some screenshot from just now ...


Bernhard

Mario


digedag

Quote from: Mario on February 28, 2022, 08:07:05 PM
IMatch does not create these intentionally.

I am pretty sure about that.

However, I can tell pretty well when I worked with IMatch by the date the files was created.
IMatch (or third party lib) must be involved somehow ... But how, that is the question.


Bernhard

Mario


digedag

Quote from: Mario on February 28, 2022, 08:33:12 PM
File format processed?

Nothing at all!
Only IMatch opened, then database loaded, then IMatch closed again. -- That's it.

See timestamps of the five screenshots above.


Bernhard

stzari

If I may make a suggestion ...
You could use procmon/64 (part of the sysinternal tools) to see which application creates those files.
I'd suggest "Operation is CreateFile" as filter.

Be aware, that the output may contain privacy related data.

Mario

Quote from: digedag on February 28, 2022, 08:39:34 PM
Nothing at all!
Only IMatch opened, then database loaded, then IMatch closed again. -- That's it.
See timestamps of the five screenshots above.

Then I'm even more puzzled about where these files may come from.

I recommend yo use the Process Monitor (from Microsoft: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon) and make it monitor the Match2021x64.exe process for file system operations in the system TEMP folder. This will show if IMatch (or anything IMatch is using) causes this.

In the next step, we would need to find out under which conditions this is happening.
For example, when you create a new database and leave it empty, then close and reopen IMatch, does this still create these files?

Jingo

Long shot - but I do recall some software that uses PLI for 3D renders ... could these be temp files created from an API call to render thumbnails/previews for 3D CAD or drawing files?

Mario

I have no idea.
For all not directly supported non-image formats (except videos and PDF files), IMatch calls Windows functions (shell thumbnail handler) to produce a rendition of the file (aka a preview/thumbnail).
These functions return an "image" to IMatch if they are successful.

Carlo Didier

Quote from: Jingo on March 01, 2022, 01:21:14 PM
Long shot - but I do recall some software that uses PLI for 3D renders ... could these be temp files created from an API call to render thumbnails/previews for 3D CAD or drawing files?
No such files on my PC ... and yet, there they are, the PLI*.tmp files ... all 48GB of them. And as I said, when I rename any one of them to .jpg, I can view it as a JPG from one of my images.
My guess is that some library that iMatch is using creates them, but somehow, due to the high number, there is an overflow somewhere so that they don't get deleted.

digedag

#23
Quote from: Mario on March 01, 2022, 11:06:42 AM
I recommend yo use the Process Monitor (from Microsoft: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon) and make it monitor the Match2021x64.exe process for file system operations in the system TEMP folder.

I have already read a lot about it. Powerful tool. But too much confusing for me at the moment.
In the German magazine c't there was a very good article about the Process Monitor, but of course also extensive. I have to deal with it first ...
Or do you have a quick guide for me?

Okay, but in the meantime first made a new test without it. Since when opening IMatch (with database) it looks for new files or files to be updated and new pliXXXX.tmp files appeared, I added my thick download folder to the database.
The pliXXXX.tmp files just came tumbling  ::) ??? ::) (and have remained).




While adding and updating files (Hinzufügen und Aktualisieren von Dateien) - pls. note the time 15:36.

Maybe a new hint?


Bernhard

Mario

#24
Screen shots don't tell me much.
Better to always include the IMatch log file (see log file)

There are many good tutorials and tutorial videos available on the web for using the process monitor. Even on the Microsoft web site. I could not do that better.
This is a tool for IT pro's and so it's supposed to be initially a little bit daunting.
Basically you want this (see attachment):

But that will only tell us, if these files are created by IMatch or any of the components, Windows routines, WIC codecs, external helper processes. Or not.
Since I have never seen these files on my systems and I regularly re-create a 50,000 files test database with files of all kinds, I wonder what makes your system different.
IMatch does not create temp files with "pli" or .tmp.


digedag

#25
Quote from: Mario on March 01, 2022, 06:09:45 PM
Screen shots don't tell me much.
Better to always include the IMatch log file (see log file)

OK, next time.

QuoteThere are many good tutorials and tutorial videos available on the web for using the process monitor. Even on the Microsoft web site. I could not do that better.
This is a tool for IT pro's and so it's supposed to be initially a little bit daunting.
Just now I am reading the articles c't 2017 ...

QuoteBasically you want this (see attachment)
I have already been to that point ...

QuoteI wonder what makes your system different.
If Carlo Didier and Thorsten (thrinn) hadn't mentioned the issue, I wouldn't have even brought it up again.
I've known about this since it first came up - and have been deleting the TEMP folder off and on ever since. DONE!

Actually, only a few experiences of me wanted to bring along to show that Carlo and Thorsten are not the only ones ...


Bernhard

digedag

#26
Quote from: Mario on March 01, 2022, 06:09:45 PM
Screen shots don't tell me much.

Sorry, Mario, to come back again with a screenshot.



Self-explanatory, I think.


Bernhard

Mario

#27
As expected, this only tells us that something used by IMatch on your system (or some other thing running on your system) is causing this.
Let me know if you find out more. Maybe your virus checker or whatever... Let us know when you find out more.

When I interpret this thread correctly, 3 or maybe a handful users see similar strange random temporary files being created.
Since IMatch does not use pli or .tmp for temporary files, I have no clue what may produce this.

Since this thread started with a user reporting IMatch unsuspectingly rescanning files which was then resolved to cache settings and purging, posts about files being created in the Windows TEMP folder by something unknown are not necessarily related. Feel free to create your own thread about this.
I have scanned the IMatch source code and I'm 99.9% sure that IMatch does not create these files by itself, or by any action it performs.
Especially not creating 48 GB of random files outside the designated cache folder.
I'm sure other users would have noticed and reported this over the past years. Since this apparently affects only a few users, while all users most likely process a similar mix of files, I wonder if this is caused by something external like a virus checker or something.

As always, if this is something I can reproduce and fix, I'll be happy to do so.

hluxem

QuoteWhen I interpret this thread correctly, 3 or maybe a handful users see similar strange random temporary files being created.

Most users, including me, never check their temp folders. I just did and I too have several pli files in my temp folder. As mentioned before, they are jpg files with a tmp extension. Looks like images I recently added to the database.

Heiner

sinus

Quote from: hluxem on March 01, 2022, 10:59:31 PM
QuoteWhen I interpret this thread correctly, 3 or maybe a handful users see similar strange random temporary files being created.

Most users, including me, never check their temp folders. I just did and I too have several pli files in my temp folder. As mentioned before, they are jpg files with a tmp extension. Looks like images I recently added to the database.

Heiner

I have also such pli-files (just now about 3000), to be honest, they do not bother me, from time to time I delete them (and other temp-files), and that's it for me.
Best wishes from Switzerland! :-)
Markus

jch2103

After seeing these posts, I checked my system for pli*.tmp files. I found 4, from last week. No indication of what created them, of course.
John

ubacher

I just checked my pli* files. Most of them are empty, the ones that are not are jpg files
(with .tmp extension) which have the dimensions of thumbnails.


Carlo Didier

Quote from: hluxem on March 01, 2022, 10:59:31 PM
QuoteWhen I interpret this thread correctly, 3 or maybe a handful users see similar strange random temporary files being created.

Most users, including me, never check their temp folders. I just did and I too have several pli files in my temp folder. As mentioned before, they are jpg files with a tmp extension. Looks like images I recently added to the database.

Heiner
Exactly. And apparently for most, the size seems to be zero, while on my PC they are far from zero, causing my C: drive to fill up, which is a serious problem (in the past, it happened once that I had no more space on C: because of these files!). Which makes the issue quite serious.

Carlo Didier

Also note that I already used process monitor two years ago to identify iMatch as the creator of the files: https://www.photools.com/community/index.php?topic=10727.0

Attached the debug log from this rescan. Maybe Mario can find something there.

Mario

#34
I have setup my system to it cleans TEMP continuously and then, while keeping an eye on Windows Explorer, I force-rescanned several file types. To see if I can make some pli* files and make them remain.
I've tried RAW files and video files and normal image files, PDF files etc. Nothing.
I could see temporary files being created, e.g. for video files. But these were also deleted again. All within a second or so.

Then I switched to LibRaw for RAW processing (Edit > Preferences > Prefer photools.com RAW processing).
And when I rescanned some RAW files, I could see one file starting with pli* being created in the Temp folder! And being deleted again. Within a second.
When I rescanned the same image, I could not see a pli file. Then again, a few forced-rescan attempts later. pli file is created, then deleted again.

All very random.

I switched back from LibRaw to WIC and could still see, sometimes, a pli file being created and deleted again.

Then I did nothing, just resizing the IMatch window so I can see both Windows Explorer and IMatch side-by-side.
And, suddenly, I could see maybe 10 pli files in the TEMP folder, but all with a size of 0 bytes. And these files remained! What?

So, I loaded IMatch into my development environment and started process monitor with a filter for the IMatch2021 process and a path filter for file names containing pli.
I could now see that something creates pli files when I force rescanned an ARW file.  Good.

Now is ran IMatch step-by-step to see which of the thousands of lines of code caused the pli file to be created.
I could finally pin it down to a routine in the image library IMatch uses.
A routine called SaveToMemory and which is used to store a loaded image into a memory blob in a specific format (JPG or PNG in this case) so IMatch can store it in the database.
IMatch uses this i.e. to store thumbnails.

I don't have the source code for this 3rd party library, but I guess it that it SaveToMemory stores the image on disk and then loads it from there into the memory buffer provided by IMatch.
And, maybe, under high-load or some other circumstances, this routine fails to delete the temporary file afterwards.
Probably. Maybe.
What are the circumstances under which these files remain?
System under stress? Virus Checker?
The Windows GetTempPath() function is known to fail or to return the same temporary file name when being called from multiple threads at the same time. Which is why IMatch does not use it to create temporary file names.

I will try to come up with a solution for this. I have encapsulated this library in my own class so I can quickly change this behavior and implement my own solution.

thrinn

This sounds like a very time consuming analysis and not really fun work. But at least, it looks like you have now a hint what happens. Thumbs up!
Thorsten
Win 10 / 64, IMatch 2018, IMA

Jingo

QuoteI will try to come up with a solution for this. I have encapsulated this library in my own class so I can quickly change this behavior and implement my own solution.

and.. once again.. THIS is the amazing support we get from Mario... relentless debugging and tracking to resolve an issue... thanks again!

sinus

Quote from: Jingo on March 02, 2022, 01:20:12 PM
QuoteI will try to come up with a solution for this. I have encapsulated this library in my own class so I can quickly change this behavior and implement my own solution.

and.. once again.. THIS is the amazing support we get from Mario... relentless debugging and tracking to resolve an issue... thanks again!

+1  :)
Best wishes from Switzerland! :-)
Markus

Mario

Quote from: Jingo on March 02, 2022, 01:20:12 PM
and.. once again.. THIS is the amazing support we get from Mario... relentless debugging and tracking to resolve an issue... thanks again!

Me no likey bugs. Especially not in my software.

digedag

Quote from: Mario on March 02, 2022, 02:16:48 PM
Me no likey bugs. Especially not in my software.

+1

Quote from: digedag on February 28, 2022, 08:31:29 PM
IMatch (or third party lib) must be involved somehow ... But how, that is the question.

First step towards answering this is made.

WELL DONE!


Bernhard

Mario

It's also fixed now. I have made a wrapper routine which does the same, but does not leave temporary files behind.

Carlo Didier

Quote from: Mario on March 02, 2022, 07:05:30 PM
It's also fixed now. I have made a wrapper routine which does the same, but does not leave temporary files behind.
Excellent! So my guess that a third party lib you were using was the culprit was right.

DigPeter

I do not think that I have this problem. Where would I find this tmp file?

Have been following this with interest.  I stand in awe of Mario's commitment and skill! ;D

Mario

If you will ever experience these left-over files depends on your workflow, computer performance, maybe even the installed anti-virus solution.
It's a bit random and seems to be caused by some 'race-condition' when IMatch processes multiple thumbnails in parallel to convert them into JPG/PNG memory blobs to store in the database.
It might not happen, or not often.

You can open the TEMP folder on your computer by typing

%TEMP%

into the Windows Explorer address bar.

Many users clear the TEMP folder frequently (especially on small SSDs) and the Windows Disk Cleanup wizard also cleans TEMP.
So, these users probably never notice this.