Duplicates

Started by frlindla, March 31, 2020, 11:55:17 PM

Previous topic - Next topic

frlindla

I have seen duplicates of some photos where the originals where taken in 2017 and 2018. For some reason they also show up in 2019. I tried to select all photos for those years and then use the duplicate finder, but it didn`t seem to find them.

Any tips and tricks using the duplicate finder?

Mario

You've seen duplicates where?
The dupe finder can find binary duplicates - but if the images show up on a different date, the metadata is most likely different. Hence - no binary duplicates.
You can use the feature to find visually similar images, which use the image itself, but are not sensitive to changes in metadata.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

frlindla

Thank you Mario.

When the same photo show up on different dates, the reason could be that metadata is different or has been changed in some way? Any idea why the same photo show up on different dates/have different metadata/don`t show up as duplicates?

Just interested in some advice here if some of you have some experience with this.

Mario

QuoteWhen the same photo show up on different dates, the reason could be that metadata is different

Yes.

The duplicate files feature compares files bit-by-bit to find identical files. Modified metadata means that the files are no longer binary identical.
Check your workflow and use the visually similar features in IMatch to detect images which look similar - independent from the metadata.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

fischmir

I totally agree with Mario.

Nevertheless I have an comparable use-case.

My wife and me takes photos with our mobiles and we sent them each other by WhatsApp/ Telegram/... (if the other one is on a business trip e.g.).

When I import all photos from the both mobiles, I have the same picture twice - once in good quality, once in lower quality. Technically is it not the same pic, but we all know they are.

On the other hand we receive pics from friends we want to use => so we can not generally ignore all pics from WhatsApp/ Telegram/...

So I'd like to find duplicates (not binary-duplicates) and delete the one in lower quality/ less filesize.

Any solution, which works for 10.000+ pics?

Thanks,
Christian

Mario

You can select 10K files and let it fly. The result window can handle this.

Tips I would try

I assume higher quality files have a larger file size. So you can use the File Size filter in the Filter Panel perhaps to find the HQ files.
Or, the dimensions (W x H) is also a good candidate.
Or, is the metadata different? The original should have metadata while the one transferred via the social app often has metadata stripped away due to re-coding.
Or, check for the "software" tag. If the apps re-encodes the file, it may place a different name there so you can use that for finding the files.

Look at a HQ and a LQ file in the Metadata Panel using the browser mode to spot the differences.

-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Tveloso

Mario, is there also the possibility of searching for binary duplicates using he pixel data only?...(in order to find files that used to be true binary duplicates, but no longer are, due to Metadata changes).

Are there settings in the Visually Similar Search dialog that you would recommend that would come close to doing that?
--Tony

fischmir

Yes, same question here.

What I recognized that at least WhatsApp removes almost all metadata. But I can use the dimensions. This means, that both files are not binary duplicates (even the metadata was changed). So in my scenario iMatch does not yet find the "duplicates" and claims, that there are no duplicates. On a binary level, this is correct but...you know.

Any idea?

Mario

The binary duplicates filter does exactly what it should: Search for binary duplicate files. Identical files. Copies of files.
If you want to search for visually similar files, use one of the search features provided for that. They don't look at the metadata. And if the two files are identical except for the metadata, the matches should have a very high percentage (95% to 100%). And if you combine this with a filter that filters out files containing metadata in certain tags (which the duplicates don't have) you should find your dupes quite easily.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Wolfgang

Hello Christian,

in 2014 we participated both in a similar discussion about visually similar images.
At the end of that discussion Mario changed the result number from a minimum of 50 to 5.

Quote from: Mario on October 27, 2014, 11:58:00 AM
In version 5.2.8 and later you can choose a match count between 5 and 1,000.

Before entering a feature request: I'm wondering, if a minium of 1 (instead of 5) would do the job.
I just tried with a picture with lower resolution, IMatch found 5 similar images, the first one was the "duplicate" I looked for.
With the actual minimum of 5 there is a good chance, that IMatch will find additional images, which are much less visually similar.
With a minimum of 1 we would force IMatch to show us the best visually similar image.
In the example with WhatsApp I would argue, that in many situation we have only one "duplicate" we look for.

Another example: I often reduce the size of images before printing. I keep these resized images in a separate folder, for example with the name PrintingDateShop.
It happens that later I would like to find for all the files in that PrinterDateShop folder, i.e. for all files I had printed, the original for each printed file.
If we have 1 single result shown for each resized image, we could easily select all results and identify them as "Printed". With more than 1 possible result, I need to select manually the first result. For a few images no problem, but with 200 or 10000 not feasible.


Wolfgang

Mario

I gave you several ways to limit the results to the best match using the Filter Panel. Why don't you use this? This should be easy and leave only one match?

Reducing a visual query search to 1 result would be good probably only for this very specific use case and not something many users would benefit from.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

fischmir

Mario, I give it try.

Can you let me know, if/ where I can see match-percentage (95% - 100%)?

Mario

Use a File Window layout that shows then.  There is a "Result Window" layout by default.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

fischmir

I show the Attribute "Ähnlichkeit" (DE). However, I can not find in the Filter.

I tried "Data-Filter" and hoped to find "Ähnlichkeit", but I did not. I want to show anly results >= 95%.

Any idea...and many many thanks in advance.

Mario

There is no such filter. And that's not what I've suggested.

I suggested you combine the visual search with a filter on metadata - because you said your low-res files have no metadata.
So a) Find dupes and b) without metadata. This should restrict it sufficiently.

Of course if you have 10,000 duplicate files added to your collection, there will be some manual work do to...
Can't you just identify the dupes by their metadata (or lack thereof)?
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

fischmir

Correct.

As minimum 5 pics are shown, whereby other pics also have no metadata, but accuracy/ Ähnlichkeit < 95%. I will do a deep dive, maybe other filters can help me.

In my case it would help to have a filter on "Ähnlichkeit" or reduce the minimum pics shown to a value less than 5.

Mario

Both is not possible. Neither is there a filter for similarity and neither you can force IMatch to return only 1 result.

I guess in all the time of writing and discussing I could have selected several thousand of the "dupe" matches in the result and mark them for deletion. Especially since this needs to be done only once? Or do you create duplicates with hi- and low-res all the time? In that case probably a purpose-built app which knows how to find the dupes and then deletes them would be better.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

Carlo Didier

Quote from: Wolfgang on April 02, 2020, 03:42:01 PMAnother example: I often reduce the size of images before printing. I keep these resized images in a separate folder, for example with the name PrintingDateShop.
It happens that later I would like to find for all the files in that PrinterDateShop folder, i.e. for all files I had printed, the original for each printed file.

I avoid such issues with a naming convention that always includes the name of the original or master and therefore any derived files are automatically identified as versions. Even without iMatch or any other DAM, any derived file gives me the name of its original file. Couldn't make it simpler.

fischmir

Quote from: Mario on April 02, 2020, 08:26:58 PM
Or do you create duplicates with hi- and low-res all the time? In that case probably a purpose-built app which knows how to find the dupes and then deletes them would be better.

I do not intentially create them, but when me and my wife send pictures using a messenger each other, they are created. In my case, as I am on business trips on a regular, weekly basis, this happens all the time as I am "updated" every day. It is happening for the last years and will remain for the next yrs.

But I totally agree with you that this is a more or less specific scenario and I'll try do find a "best guess"-solution (even I hoped iMatch could solve this deterministically).

HOwever, thank you for your fast replies.

Mario

Quote from: Wolfgang on April 02, 2020, 03:42:01 PMAnother example: I often reduce the size of images before printing. I keep these resized images in a separate folder, for example with the name PrintingDateShop.
It happens that later I would like to find for all the files in that PrinterDateShop folder, i.e. for all files I had printed, the original for each printed file.
[/quote]

You can easily find such files with the folder filter?
Or maybe do what most users do, assign these files to a special category. Or use a collection like flag, dot or pin.
This way you can easily identify these "print versions" later and do whatever you want to do.

Or you use a formula-based category with a @Folder regexp. This category then automatically contains all your print files (assuming your folder naming schema for print files is consistent).

Many easy and automatic ways to deal with such workflows.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

fischmir

Mario, one last question:

Quote from: Mario on April 02, 2020, 08:26:58 PM
In that case probably a purpose-built app which knows how to find the dupes and then deletes them would be better.

What I recognized, "Date created" is not changed and remains the same for LQ and HQ.

I have never created a app on my own yet, but can I call the function "Similar images" using script functionality? If yes, I would call that funtion, which returns 5 images. Afterwards I would select all images with the same "Date created" (including the master-file"). From this result (all images with the same date/ time created) only the HQ-pic will survice.

Do I have the "accuracy/ Ähnlichkeit" in script functionality as well?

Mario

This visual query functionality is not available to scripts.

But of the HQ file and the LQ file have the same timestamp, they should be easy to find nevertheless. No visual query needed.
If an app searches the database for groups of files with the same date and time (unlikely that you have taken multiple files in the same second or even sub-second) and then knows a way to tell the LQ from the HQ (e.g. via another metadata tag) this should be doable.

Unfortunately I have no time to write this app for you.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

fischmir

Quote from: Mario on April 03, 2020, 09:07:59 AM
This visual query functionality is not available to scripts.

What a pity.

I'll try my best, thanks.