How to Find AND Delete Duplicates

Started by GKent, April 08, 2015, 01:27:48 PM

Previous topic - Next topic

GKent

I'm new to IMatch. My wife used IMatch 3 a long time ago. Now we have upgraded to IMatch 5.

Mario has been helping me with setup issues, but now that I'm starting to deal with use issues, I don't want to keep bothering him.

We have a huge duplicate issue. Folders and folders of duplicates. Let's narrow this to binary duplicates.

I'm using Windows 8.1, so in File Explorer, by clicking around a little, and doing some searches, it is not so difficult for me to find the folders of duplicates. But I don't know how to do this in IMatch 5.  So for now I'm going to do a combination of finding my duplicate folders using File Explorer, but delet those folders in IMatch, so I keep my IMatch database in sync.

But I would think there there would be a way to do what I am wanting in IMatch. I have read the Help section on finding Duplicates, but what I'd like to find in Help is a section on "Finding and Deleting Duplicates," if this is a common issue, which I would think it would be, but maybe not.

Mario guided me to use the @All in my Categories, in order to get a search of all of my photos. So I did that, and then used a Search>Duplicates. But what do I do next?

I'm guessing that 80% of my duplicates are in whole folders that I can delete. And then I will be down to the 20% that are mixed in with other photos in folders, so that I won't want to delete the folder.

So, I think I'm wanting a procedure for finding a deleting duplicate folders and then a procedure for finding and deleting individual duplicate photos.

I'm guessing that IMatch can help me with the second type of search and delete, for photos in folders where there are both photos I want to keep and photos that I don't want to keep.  At this point I'll want to keep the photo that is where I want it to be and delete the others. So, is there a way for me to be able to easily see the location of the original AND duplicates.

And then, for a binary duplicate, is there any difference between an original and a duplicate? I'm thinking that I can keep whichever photo is where I want it to be and delete the others.

Thanks in advance for any tips.

Greg

Mario

Welcome to the community.

I will move your post into the General Discussion board. This is better for asking questions.

The board in which you posted (FAQ, Workflow, Tutorials, Tips & Tricks) is intended for given other user tips or for posting FAQ articles.

Tip: Each board shows a description in the overview page. These help you to find the right board for your post.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ChrisMatch


Mario

The duplicate search in IMatch 5 scans your database for binary identical images for each of the original images you use in your search. In the result window that opens, each original image is displayed, and below that you'll see all duplicate files. IMatch automatically filters out original files which produce no match (files without duplicates).

To see the folder name of a match, point the mouse cursor at the thumbnail. The File Window tip pops up and shows the folder name.
To open the corresponding folder, select the duplicate file and then press <Ctrl>+<G> or use the "Goto Folder" command from the context menu of the thumbnail.

If you want to see the folder names of all matches etc. you can setup a tabular file window layout which displays the folder name in a column. Such a layout is rarely needed and thus IMatch does not contain one by default. But it is fairly easy to create your own file window layout which shows exactly the information you want to see for each file. See "File Window Layouts" in the IMatch help for details, screen shots and examples.

If you are looking for a "Find all folders in my database which contain exactly the same amount of files, and each file must be binary identical to it's pendant in the other folder", you're out of luck. Although IMatch packs an astonishing number of features, such a feature is not available.

Tip: If you know that you have the same folder in multiple copies, just located in different parts of your folder hierarchy or on different drives, you can use the Find / Filter functions in the Media & Folder View (The Folder Filter panel below the folder tree) to quickly find folders with specific names or to restrict the folders displayed to folders matching a given search text.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

ubacher

#4
I had not used search for duplicates before  so this discussion got me to try it out.
I knew I had duplicates which however were renamed: Imatch did not find these! Should it have?

To GKent:
If your duplicates are of the same name: I have a script which finds duplicate file names in the whole db.
( I use it to make sure all files in my database are unique.)

Added later: I did some more tests and it does find duplicates that are renamed. But the one set I have
it does not find. How would I go about to find out why - what is the difference between those files?

Mario

If you use the Duplicate search, you will find only binary identical duplicates. See the help for details.
The file name does not matter, but if you have modified the file, changed metadata or whatever, it is no longer a binary duplicate.
-- Mario
IMatch Developer
Forum Administrator
http://www.photools.com  -  Contact & Support - Follow me on 𝕏 - Like photools.com on Facebook

MyMatch

Thats easy to find ...

I also tried to find a tool capable of such a search, but then created a small script.
It´s "bash" code, so for Windows you need "cygwin" or something.


#! /usr/bin/bash

set -e

DIRS=( "$@" )

if [ "${#DIRS[@]}" -eq 1 ] ; then
        if [ -s "${DIRS[0]}" ] ; then
                echo "${DIRS[0]} is file, reading dirs from it"
                IFS=$'\n' DIRS=( $( cat "${DIRS[0]}" ) )
        elif [ -d "${DIRS[0]}" ] ; then
                echo "${DIRS[0]} is directory, finding dirs from it"
                IFS=$'\n' DIRS=( $( find "${DIRS[0]}" -type d ) )
        elif [ "${DIRS[0]}" = "-" ] ; then
                echo "${DIRS[0]} is STDIN, reading dirs from it"
                IFS=$'\n' DIRS=( $( cat ) )
        fi
fi

if [ -d ~/CKSUMS ] ; then
        rm -f ~/CKSUMS/*
else
        mkdir ~/CKSUMS
fi

for line in ${DIRS[@]} ; do
        echo "${line}" >> ~/CKSUMS/$( cd "${line}" && find -printf "%s %f\n" | cksum | awk '{print $1"_"$2}' )
done


Call it like this:

./find_double_dircontents.sh <FILE>
where <FILE> contains a list of folder to check.

or

./find_double_dircontents.sh <FOLDER>
to just search the given folder

or

find ... | ./find_double_dircontents.sh -
to feed directores from a "find" command or something

It then deleted the folder ~/CKSUMS in your $HOME and creates of file with the name of the checksum that have the same contents, totally and down to the last hierarchie.

Example output:

$ cat  CKSUMS/1537272305_167
/cygdrive/i/ORIGINALS/CHECK_FOR_DOUBLES/ZIP/ARCHIVES/BILDER_SAMSUNG/Studio_Fotografie
/cygdrive/i/ORIGINALS/CHECK_FOR_DOUBLES/ZIP/ZIP/Studio_Fotografie


So, those two directories are exactly the same!
They have the same content.

You can now decide, which one to remove :D

MyMatch

#7
You could restrict the search to folders that contain images ...
I use it for more generic work - regardless of the content

ubacher

QuoteIt´s "bash" code, so for Windows you need "cygwin" or something

I just read somewhere that Win 10 ,after the Anniversary update, can handle "bash" code .

(Bash is a Unix shell and command language)

MyMatch

So, go and try :D

I am at Windows 7 and plan to stay there for the next 20 years :D