Going Paperless with PDF -- Use Imatch or separate DMS or just Windows

Started by WebEngel, April 18, 2018, 08:21:34 PM

Previous topic - Next topic

WebEngel

Hi all,

having used Imatch for images for > 10 years, I now want to start managing PDFs as well.  The reason is that I want to go paperless.

This means: All incoming paper documents go into a scanner that generates a searchable PDF document (i.e. one that contains two layers: the layout as an image and all text as text).  In the same category go PDFs I receive digitally (i.e. digital invoices).

Now the documents do have readable text but completely unstructured.  At the same time, the filename will just be something like the scan timestamp, not something meaningful like 20180311_Invoice_medical_Dr_Smith (where the date would refer to the date found in the PDF and NOT to the scan date).  Now comes the critical part: search certain documents and maintain a status:

Searching a document can be done with Windows easily.  The trick is done by Windows Indexing.  So I can search for a certain invoice by the name of the company or for a tax document in a similar way.  Complex searches are not possible, I assume (Invoice but not from Amazon).

Maintaining a status refers to things like: invoices to be paid, documents that need to be read or require work to be done.  Now that is something where Imatch would excel.

However, I cannot combine the two aspects—searching and maintaining a status.  In Imatch, I do get Previews (from PDF X-Change Viewer), but that is just a preview, without any search.

Is there any way to import the full PDF text into the Imatch DB so that I can search like in Windows?

Anybody else doing document management with PDFs?  What do you use?  Imatch?  Or a completely separate document management system?

Thanks in advance for any reply.

Martin

Mario

IMatch does not index the text in PDF files. This is technically quite complicated.

Usually you put things like title, author, description etc. into the PDF metadata, which is automatically imported by IMatch and can be used for searching, sorting, categorization etc.
Don't your PDF files contain any usable metadata? Have you checked in the IMatch Metadata Panel using the PDF layout? PDF files can hold specific PDF metadata but also an embedded XMP record!
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Menace

@WebEngel:

How about such thing like X1Search (or Copernic Search)? This are desktop-Search-Programms. I use both, IMatch and X1Search (and "Everything") to find all kind of files in seconds.

Everything: Search for name, folder: https://www.voidtools.com  (very fast)
X1Search: Search for all kind of files (inside search): https://www.x1.com/products/x1_search/

But most important, my daily work with pdf, Word, Excel, Illustrator, Affinity, txt, mp3, QuarkXpress-Products, PS-Documents, RAWs, video, ... IMatch is the solution. X1search just for research.

WebEngel

Quote from: Mario on April 18, 2018, 08:43:53 PM
Usually you put things like title, author, description etc. into the PDF metadata, which is automatically imported by IMatch and can be used for searching, sorting, categorization etc.
Don't your PDF files contain any usable metadata? Have you checked in the IMatch Metadata Panel using the PDF layout? PDF files can hold specific PDF metadata but also an embedded XMP record!

Sure, my digital PDF documents (the  ones I received as file) do contain metadata, but this is mostly completely useless, nothing I would search for.  For example, I have just checked 10 user manuals I downloaded (all different stuff, household, chargers, coffee grinders, flash).  1 of them had useful metadata.

The scanned documents do not contain metadata at the time I scan them.

Sure, I can create title and keywords for every document, but that would just be a painful exercise I will not do reliably, hence it will be useless.

This is why I need to search the content.

Martin

bonsai


Jingo

Just to add another option... give Fileseek a try... it doesn't require any indexing (I've long since turned off windows index service as another resource hog (along with the font service)) and find files very very fast with a plethora of options..  I use it mainly to locate fonts across my vast system of hard drives and network drives when I need to temporarily install a font for a client project and don't want to load up a font manager like Nexusfont (which tends to take a long time to scan and display the fonts and also causes all the fonts to be loaded into memory so Adobe Indesign and Photoshop become bloated and slow).

Good Luck!