
Andrew McGlashan wrote:
Hi James,
Okay, here is one method for bulk comparing two directory trees that will give you a start as you can then eliminate some files from further analysis.
If you do an md5sum of all files of each tree sending the output to a file for each tre, then you sort those files. Once you have those files, you can start eliminating files that share the same basename and md5 checksum.
You could use something other than md5sum, such as sha256sum in the same manner.
finddup (from the perforate package) might likewise help. $ ls good-copy/ bad-copy/ $ finddup -l # links duplicate files $ find -links 1 # find files that have no duplicate AFAIK it checks length and sum. You can use finddup -n -v instead to have it merely report duplicates, but IIUC you want to find the NON- duplicates. finddup theoretically can compare dirs in separate areas, but IME it only works properly on $PWD.