Is IMatch no longer checking for duplicates for HEIC Files?

Started by Tveloso, April 15, 2025, 10:39:16 PM

Previous topic - Next topic

Tveloso

Recently, when I indexed a small batch of files, I thought it very possible that there might be a few duplicate files among the group, and was surprised to find that IMatch didn't put up that notification.  I later found that two of the newly indexed files actually were duplicates.

As a test, I selected 12 files that I had recently finished processing (and were now archived on my NAS), and pressed Ctrl+C to copy them to the Clipboard, and then in Windows Explorer, pasted them into a (currently empty) local folder already indexed in IMatch.  When that folder was re-scanned, and the 12 Files added to the database, again no duplicate files notification was issued (and my Duplicates Category remained empty).

I have the "Visually Identical, Same File Format" option configured for assignment to the Duplicates category:

    Screenshot 2025-04-15 163304.png

I wonder if maybe change #02778 (Checking for Duplicates During Indexing):
QuoteWhen the option to check for visually similar duplicates is enabled, IMatch now skips non-suitable formats like videos to reduce false detections.
...might perhaps be considering HEIC Files as part of the Video File Group to skip?

But when I read that item in the Release Notes, I was a little unsure about it...wouldn't it make sense to still check for duplicates even with Videos?
--Tony

Mario

Provide 3 of the images you consider visually duplicate but which failed to be detected.
This gives us something to test with. When I index copies of HEIC/HEIF files in a test database, they are identified as dupes.

"Skips not suitable formats" means files which cannot produce an image (text, binary, ...)

Tveloso

Mario, I neglected to mention that IMatch does still consider the files as duplicates, when doing a search (Ctrl+M, I).  They just weren't added to the duplicates category when they were indexed.

The files were true duplicates (in fact, they were binary duplicates), since I just copied a few (previously indexed) files to another folder, and then indexed those copies.

I'll repeat the test again tomorrow, with Debug Logging on, in case it might show something (and can send the files themselves if needed).
--Tony

Mario

A, I see the problem. These stupid HEIC/HEIF files.
Since they can be both images and videos and audio files and who knows what else, IMatch classifies these are multimedia.

And the dupe check during indexing considers files of class "image", "vector", "pdf", and "office". Not multimedia, which can be anything. It usually makes no sense to perform a visual dupe check for video files, because different frames could have been used to produce the visual fingerprint.

I shall add yet another special case for HEIF/HEIF/HIF/HEICS/HEIFS/AVCS/AVCI/... files.

Tveloso

--Tony