Go to TogaWare.com Home Page. GNU/Linux Desktop Survival Guide
by Graham Williams
Duck Duck Go

Duplicate Photos

20200411 As presented in Section 18.5, duplicate files (photos in our case) can readily be found. Duplicates are easy to create when we are copying photos around on our storage and attempting to manage large collections of photos with different file naming schemes.

The fdupes package provides the fdupes command to find and remove duplicates. With no options fdupes lists groups of duplicated files in the specified directory:

$ fdupes .
./20180323_122434_02.jpg
./20180323_122434_01.jpg
./20180323_122434_00.jpg

./20030102_092312_03.jpg
./20031012_092312_00.jpg

./20200531_151245_01.jpg
./20200531_151245_00.jpg

Use the --recurse or -r option to recurse into subdirectories.

A summary of duplicates is obtained using the --summarize or -m option:

$ fdupes --summarize .
13567 duplicate files (in 6407 sets), occupying 16996.0 megabytes

Deleting duplicates will retain the first listed file and so sometimes it is useful to use --reverse:

$ fdupes --order='name' --reverse .
./20180323_122434_00.jpg
./20180323_122434_01.jpg
./20180323_122434_02.jpg

./20031012_092312_00.jpg
./20030102_092312_03.jpg

./20200531_151245_00.jpg
./20200531_151245_01.jpg

The following command will then delete duplicates, keeping the first file in the list:

$ fdupes --delete --noprompt --order='name' --reverse .


Support further development by purchasing the PDF version of the book.
Other online resources include the Data Science Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.