Filter for dublicates

Started by ben, September 26, 2020, 11:21:52 PM

Previous topic - Next topic

ben

Hello everyone,

i am trying to find dublicates and i am looking for good ideas for how to do it.
My wife regularly uploads images into our storage.
I have added some (many) of these images already to iMatch before (with new filename and metadata).

My idea is to mark the images of my wife's last upload (e.g. blue label).
I would then need to filter for all days, that contain at least one image with blue label.
If i then show all images of these days, i can scroll through them easily.
I would see dublicates side by side.

This is quite specific, but maybe there is a way with iMatch.

Any ideas how?

Ben


ben

Quote from: jch2103 on September 26, 2020, 11:36:09 PM
See the Help: https://www.photools.com/help/imatch/#search_basics.htm
Thanks, i never tried it. I read through the help and did some tests.

The iMatch Search->Duplicates function seems to respects the metadata as well.
In my case, the images are identical, but have different filenames and metadata.
So, the duplicates are not shown.

Mario

The duplicates search does what it says on the lid: It searches for binary identical files.
If a file has different metadata, the files are no longer binary identical.

To find files which look similar (or identical) use the corresponding search tools: Finding Visually Similar Images...

HaWo

Sometimes ist XnViewMP better (sorry Mario).
Hans-Wolfgang

Mario

Quote from: HaWo on September 27, 2020, 10:15:26 AM
Sometimes ist XnViewMP better (sorry Mario).

There will always be an application which does something better.
It would be more useful for me if you would explain exactly what XNView does better. I'm always happy to improve IMatch.

Has XNView a feature where they do a checksum over the image data (or the preview / full RAW / thumbnail for RAW files) and consider this when matching files? Ignoring metadata and other data contained in the image?
I have never used XNView so I cannot tell. But I don't recall a feature like this having ever been requested, so..

IMatch allows you to find binary duplicates (dupes) very easily.
It allows you to find identical images (even with modified metadata) as easily too using the "visually similar" search command.
What's missing?

HaWo

#6
XnView ist überhaupt nicht mit IMatch vergleichbar, es ist nur ein einfacher Bildbetrachter. Grundsätzlich arbeite ich mit IMatch und bin für meine Bedürfnisse damit sehr zufrieden.

Wenn ich aber eine verirrte ähnliche Datei (z.B. bearbeitet und falsch eingeordnet, usw.) suche, dann bin ich mit XnView bisher schneller fündig geworden, ohne dass ich eine Anzahl von ungewollten Bildern wie in IMatch angezeigt bekomme.

Markiere ich eine Datei in IMatch und suche dazu eine optisch ähnliche, dann bekomme ich die unterschiedlichsten Interpretationen, die mit der Referenz oft nichts zu tun haben. Wenn die Referenz farbig ist, werden auch sw-Bilder angezeigt. Diese Such-Funktion gibt es aber in XnView nicht . Ich hätte die aber gern in IMatch etwas genauer. Probiert habe ich schon so manche Einstellung aber leider bin ich da nicht sooo ganz glücklich.

Wenn ich ähnliche Dateien in XnView suche, dann wird z.B. das ganze Verzeichnis abgeklappert und die ähnlichen Dateien werden dann razfaz angezeigt. Das wäre jetzt so ein Fall für Ben und darum mein Tipp.

Wenn Du Dir mal die Zeit nehmen könntest um XnView anzuschauen, würdest Du gleich erkennen, was ich meine. Den Funktionsumfang von IMatcher erreicht XnView nicht, aber das Beschriebene nutze ich vorzugsweise und das Ergebnis zählt.

(XnViewMP ist in der Suche schneller als XnView).

Hans-Wolfgang

Mario

Bei den Suchergebnissen der Duplikatsuche immer das Sortierprofil" "Standard" verwenden. Nur so werden die besten Ergebisse zuerst angezeigt.
So finde ich identische Bilder mit geänderten Metadaten sofort, weil die natürlcih als erstes im Ergebnisfenster angezeigt werden. Datenbankweit.

HaWo

Hans-Wolfgang

Tveloso

Mario, would it be worthwhile to allow the Add Files to categories option, in Edit->Preferences->Indexing (in the All duplicate files section), to also provide a "Potential Duplicates" CheckBox, which would run the Visually Similar Search, and return just the first (most similar) file, to add to the selected Category?...(with some type of Similarity Threshold, to prevent every newly indexed file from being added to the Category)

Like Ben, I periodically index images that include duplicates (from mobile phones).  IMatch does catch the dupes and adds them to my configured "Duplicates" category, but of course, only if I have not yet updated the previously indexed copy of the duplicate set (so the files are still Binary Duplicates).  But if the previously indexed files have had updates, then new copies of the same image are no longer detected as duplicates at Indexing.

If IMatch allowed us to also add files to this (or a separate "Potential Duplicates") Category, while Indexing, that would allow us to work with the Potential Duplicates, and easily determine whether or not the newly Indexed files actually are duplicates.
--Tony

Mario

The binary duplicate check is blazing fast because it compares the checksums IMatch maintains for each file to spot duplicate files.
Very fast to run for each incoming file when it has been processed.

The visual query feature is thousands of times slower. And the query index is invalidated when new files come into the database - and rebuild when the user runs a visual query the next time (or never).
Doing a visual similarity scan during indexing (for each new/updated file) would be impossible for this reason.

Doing a visual similarity scan after the indexing has completed would be doable but could slow down IMatch considerably.
Also to consider is that IMatch may run multiple indexing operations in steps, e.g. after loading a database, when it checks all the folders in the database for updates and finds several updated folders. Each folder may add new or updated files to the background processing, which then processes them in batches as soon as there is time and the user is not doing things which cause the queue to pause...

Probably doable but complex.
This requires at minimum a feature request and a substantial number of users who need this.

ben

For my needs, I am already writing a short script to show dublicates in a result window.

Is there an endpoint to:
  - pass a list of dates (date created) and
  - retrieve a list of file IDs (images taken on one of the days included in the list)

Or can I combine different endpoints?

Regarding the built in imatch functions. The solution with the binary search not considering metadata sounds really good. Depends on complexity to design and on how many will use it of course.

I will try the visual search again with my ~1000 dublicates. Will see if that's a good option for me. I will have dublicates in the future again, I know my wife.  ;D

Mario

Quotehe solution with the binary search not considering metadata

We may need to consider metadata still - what about crop records, different EXIF orientations, ...

Selecting 1000 files and running a visual query with a minimum number of matches is easy to do. Make sure you use the "Default" sort preset in the result window.

Tveloso

I thought it might be difficult to add something like this at indexing time.

Still, it would be nice to have the ability to "automatically" identify duplicate images for files that are no longer binary duplicates...

Quote from: ben on September 27, 2020, 11:18:31 PM
I will have dublicates in the future again, I know my wife.  ;D
Exactly the same here!

I'll go ahead and post a Feature Request in case there are others that would benefit from something like this.
--Tony

ben

Can someone help me with my endpoint question?

QuoteIs there an endpoint to:
  - pass a list of dates (date created) and
  - retrieve a list of file IDs (images taken on one of the days included in the list)

Or can I combine different endpoints?

sinus

Best wishes from Switzerland! :-)
Markus

David_H

Quote from: ben on September 28, 2020, 08:13:49 AM
Can someone help me with my endpoint question?

QuoteIs there an endpoint to:
  - pass a list of dates (date created) and
  - retrieve a list of file IDs (images taken on one of the days included in the list)

Or can I combine different endpoints?

I would imagine the /v1/files/query endpoint would do what is needed....
However it takes a query parameter that doesn't appear to be documented anywhere (it says see tutorial links, but doesn't see where the example is!).

One of the undocumented events endpoints will also return files within a certain date range (because thats how the 'create event for files in this date range' currently works).

Mario

Just fetch all files with the v1/files endpoint and compare the date in your app as needed?


IMWS.get('v1/files',{
  fields: 'id,datetime'
}).then(response => {
  // Use file data
});


You can fetch all file data at once or, if your database has hundreds of thousands of files, you can fetch the data in blocks and process each block.
See the documentation of the endpoint for more information.

Returns an array with all files in your database and for each file a JSON object that looks like

    {
      "id": 1234,
      "dateTime": "2018-05-11T11:30:57"
    },


which makes it easy to compare the data with whatever you want to compare it with.

thrinn

Maybe you could also use the v1/files/groups endpoint, provided that you are happy with the date IMatch sees as representative for a given image.

For example, I use something like this in one of my scripts to let group IMatch the files by day.
  let vSel = await IMWS.get('v1/files/groups', {
    idlist: pIdlist,
    groupby: 'day',
    fields: 'id',
  });

pIDList could be e.g. the current selection, but also the All Files IDList.

Thorsten
Win 10 / 64, IMatch 2018, IMA