No preview with (many) PDF's

Started by rienvanham, February 09, 2025, 01:09:35 PM

Previous topic - Next topic

rienvanham

Hi Mario,

I have an issue since v2025: with v2023 I had a "preview" (don't know what the correct name is) for every PDF in the browser. I have around 20.000 PDF's. After upgrading to 2025 (and many time "pending metadata writebacks, compacting db, importing new files") I have a huge amount of PDF's with no "preview", just a small icon with "pdf" in it.

Am I doing something wrong? Is there a way to generate the "previews"?

Thanks in advance!

Rien

rienvanham

I think I found the solution (but it's a bit inconvenient): select the files and do a "force update".

Mario

IMatch uses the same (but updated) external component to extract PDF from previews.
If creating the previews while adding the files failed, there are probably 500 potential reasons, probably an "overload" of sorts. Do you have many folders with many PDF files? What kind of computer? Did you check for any warnings in the log file after the thumbless files showed up in the database? If the log file is gone, there's no way to tell.

Maybe you can reproduce it by copying one of the many PDF files folders outside of IMatch, then adding this folder to the database. Switch to debug logging before (Help > Support log file). If this again fails to produce thumbnails for the PDF files, the log file should contain warnings with more information.

rienvanham

Hi Mario,

Thanks for your superfast answer. I have several hundreds of folders with PDF-files (e.g. \taxes\2000; \taxes\2001 etc)
and a machine with a 5950x-processor, 64 GB of RAM, nvidia 3090, M2-ssd etc.

I will take a look at your suggestion; not today because I have to leave now.

Thanks so far!

rienvanham

#4
Hi Mario,

I see a lot of errors in the logfile, e.g.:
02.10 12:08:32+  344 [6150] 01  W> Failed to process PDF file E:\Bestanden\Mijn Documenten\Werk\Mari\2014\2014-08-25 - Salaris 2014-08.pdf. Unable to aquire a critical section for the external helper utility. Try to reduce the number of threads used for ingesting.  'V:\develop\IMatch5\src\ptpicore\PlugIns\ptpimm\PTPIMultiMedia.cpp(1313)'
02.10 12:08:32+    0 [6150] 10  M>  <  0 [63907ms #sl] PTPIMultiMedia::GetPDFThumbnail
02.10 12:08:32+    0 [6150] 01  W> Failed to produce a large thumbnail: 80004001 'Niet geïmplementeerd'  'V:\develop\IMatch5\src\ptpicore\PlugIns\ptpimm\PTPIMultiMedia.cpp(593)'
02.10 12:08:32+    0 [6150] 10  M> >  0 ShellExtractThumb  'V:\develop\IMatch5\src\ptpicore\PlugIns\ptpimm\PTPIMultiMedia.cpp(48)'
02.10 12:08:32+  187 [6150] 10  M>  <  0 [187ms] ShellExtractThumb
02.10 12:08:32+    0 [6150] 10  M>  <  0 [64094ms #sl] PTPIMultiMedia::LoadFile
02.10 12:08:32+    0 [6150] 10  I> EUQH::Load(1) of E:\Bestanden\Mijn Documenten\Werk\Mari\2014\2014-08-25 - Salaris 2014-08.pdf with 2000 x 2000 (O: 2000 x 2000) in 64094ms

Is this enough for you? Or do you need the whole logfile?

Thanks,

Rien.

Mario

Interesting.

The " Unable to aquire a critical section for the external helper utility" tells me that a timeout has expired.

In community post 8610 we identified an issue where using multiple instances of the external helper utility pdftopng.exe causes sometimes pages to be rendered as blank in the resulting thumbnails.
To prevent this, IMatch limits the number of instances to one, causing other processing threads to wait until the current PDF is processed.

In your case, you are processing many PDF files in sequence, and processing one (or more) of the PDF takes longer than the timeout is configured to wait. The timeout is a loop, trying to acquire the global lock for 250ms. If this does not work, it repeats this. The loop tries this up to 240 times, waiting up to a minute for a PDF to be processed.  If this still does not work, this means that another thread needs longer than a minute to process a PDF and the current thread gives up and logs the warning you see.

As a last resort, IMatch tries with the shell thumbnail handler installed by applications like Adobe Acrobat. This seems to work and a thumbnail / preview from your PDF was produced after 65 seconds.

I have no idea why it takes so long to extract a preview from your PDF files. Or maybe your system is overloaded? PDF extracts usually take 1 to 5 seconds with my test set of over 1,000 PDF files.

rienvanham

Thanks Mario,

I have no idea (and no idea what to do now ;-)) what is going on. Could it be my PDF-app (whicj is Kofax Power PDF)? My workaround for now is: select a parent folder (with "show all levels"); select all files (e.g. a folder contains a lot of subfolders and a total of 725 files), do a CTRL-SHIFT-F5 and do a forced update. Now all previes appear in a few minutes.

Rien.

rienvanham

What maybe could be interesting: After upgrading to v2025 I deleted the whole tree of PDF-files from the database and re-inserted them again. The first thousands of files were processed correctly: they all have previews. After a certain moment no files got a preview anymore. It looks to me as if "something" has been overloaded after a while.

What I can offer is to remove the tree again; put the log in debugmode and import the whole tree again.

Rien.

Mario

I would not do that.
Just process the PDF files in batches of a few hundred. If there is a "stress issue", this should avoid causing it problems. IMatch dials down the number of threads after a few seconds of idle and when you then process a new set, things start fresh.

I have only ~ 1,000 PDF files from a wide range of sources in my test library, and I can process them in one go. No other user so far reported this problem. IMatch 2025 is able to utilize a lot more resources than older version, making things as fast as possible by using what's there.

You can also try to download the performance profile to balances or even low (Edit > Preferences > Application) to reduce the stress/load on your system while you index the PDF files.

rienvanham

You're right. It doesn't make that sense to do it over again. I'll do a force update in chunks of a few hundred files. After these updates all seems fine.

Thanks for your help Mario. I'll take a look at your suggestion to reduce the stress.