Organising 162,356 files using FDUPES and NameMangler
I’m a data hoarder; there I said it.
I’ve been taking digital photos since around 2002; over the years and many computers / hard drives i’ve moved, copied and added to a ‘temp folder’ countless image and video files. In the back of my mind i’ve always said ‘i’ll sort these later’, only to find that weeks and months pass, and as a result, the next batch of images have been added to yet another folder. Periodically I attempted to organise my photos and videos using Adobe Lightroom (and for a time it was really useful); the file renaming feature helped me organise into year, month, day folders (using the EXIF data embedded in the images). Unfortunately I forgot to backup my Lightroom catalog from a Mac during a reformat (dumbass) and after spending so much time organising in Lightroom I felt somewhat defeated.
More months and eventually years pass; my once organised directory of images had hopped around from external drives to cloud storage and back again, and as a result I had duplicated the main folder and in some instances added images and in others not. Disaster.
I finally decided that enough was enough, and began by collecting all my duplicate dirs into a single location. After many hours of copying and migrating images from various sources I had 162,356 image and video files, which totalled 408 Gigabytes. The only way to tackle this first hurdle was to do a diff on each file, keep a singular copy and purge the rest. A quick search pointed me in the direction of FDUPES (https://github.com/adrianlopezroche/fdupes); a program which does exactly what I needed. Now being rather precious about some of my photos I wasn’t about to unleash FDUPES on my main directory, so I opted to create a ‘test’; a sample of duplicate images, nested in various folders and with differing filenames. After several initial trials, I felt comfortable to run this against the core dir.
Command I used was: fdupes -r -d -N /path/to/dir
FDUPES spent a long time building a file list, comparing each file, and after recursively checking through all the files, began the purge. This was a rather nervous moment; watching a terminal window delete thousands of what the program determined to be a ‘dupe’. Once completed, I was left with a fragmented and somewhat sparse core directory, in which some folders were completely empty whilst others contained hundreds of photos. At this point I thought i’d made a massive mistake and had wiped out precious images that I could never recover. Spending some time moving and collating the remaining images from these fragmented directories, my fears were abated; the dedupe had done its task, and I now had one copy of each image. Phew!
I opted to use EXIF image data to rename the files, and another search found me EXIFRenamer (http://www.qdev.de/?location=mac/exifrenamer), which allowed me to drag and drop images to rename them based on a pattern of my choosing (I opted for ‘YYYY-MM-DD-HH-MM-SS’). For the most part, this worked for the vast majority of my images. However, a few files seem to have missing or incorrect EXIF data (don’t ask me how). On closer inspection I found that the original modification dates were displaying the correct date for when the image was taken, but I didn’t have an easy way of using this to rename the file. After some lengthy searching, I came across NameMangler (https://manytricks.com/namemangler/). This utility gave me the option to use the modification date as a parameter for renaming files (it’s limited to renaming 5x files at a time when unlicenced, but for my needs this was ideal, as only a handful of files needed adjustment).
Now I’m the proud owner of 47,235 images, which totals 136Gb; and I promise to keep it up-to-date!