Do duplicate tracks matter?

Christmas DNA

Consider all the albums, compilations, mixtapes, cover mounted CDs and miscellaneous tracks you pick up. Over time it's easy to add the same track to your music library in multiple places. The ultra efficient may want to remove these duplicates, but is it really required?

Removing duplicate files is one of the oldest ideas on the bliss ideas forum. This isn't too surprising; I've noticed a number of software tools for removing duplicate MP3s and music files from your music collection. As you build a music library and add compilations to original albums it's inevitable you'll get some kind of overlap.

How would it work?

There are a number of ways of making this work. The traditional way would be to attempt to parse file paths and music tags to identify duplicates. This would rely on a degree of lenience in the matching, but it could be made to work, so long as the file name and tagging metadata accurately described your music. It might not!

New approaches such as acoustic fingerprinting may provide more accurate identification of duplicates. This approach would not rely on manually entered metadata but an actual analysis of the audio data within a music file. While not 100% accurate this would give better results, I think.

Once the duplicates are identified, bliss would mark the relevant albums as non-compliant and provide a fix to delete one of the tracks. As a sure-fire, 100% guaranteed way of identifying duplicates is unavailable, I doubt we'd want to automate this.

Problems, problems

Deleting one of the tracks poses the first problem. Which track is deleted? And what happens to the new 'hole' bored in the affected album? This album has now lost its canonical integrity, as it is now missing a track. There are technical ways to get around this; for instance using file linking we could create a file link using the deleted file's name to the duplicate track that was retained.

This doesn't help, much, though because of the metadata contained within music files. Pertinently, this metadata does not only identify the audio contained within in isolation, it also identifies the context in which the audio exists. For instance, Day of the Lords may be the title for an audio recording, but it's also a fact that the same track is both the second track on an album called Unknown Pleasures and the fifth on an album called Permanent. If one copy were deleted and the affected library linked to the retained copy, the contextual information for that track would now be incorrect.

So while file linking helps reference some duplicate audio, this will also drag in invalid metadata for the wrong album.

It's for this reason that I am not convinced duplicate track removal is a good idea. Frankly, these days storage is cheap and getting cheaper, faster, and so I see little reason to worry too much about storage efficiency. Pick your battles.

One improvement on the idea

It did get me thinking, however...

Consider two music files whose audio has been flagged as duplicate. If the higher quality music could be identified, then the other could also be 'upgraded' by copying the audio over. It could even be a way of upgrading from lossy music files to lossless, so long as your music player copes with album tracks being in differing formats.

Thanks to kevin dooley for the image above.
tags: duplicates acoustic fingerprinting tags ideas

The Music Library Management blog

Dan Gravell

I'm Dan, the founder and programmer of bliss. I write bliss to solve my own problems with my digital music collection.