Errant artist names

Emily's meant to be doing her homework!

Diverging artist names in a digital music collection makes the collection more difficult to browse. Imagine owning a number of albums by The Beatles. In a really untidy collection, some might be stored under The Beatles, some under simply Beatles, maybe some under Beetles and some even under Beatles, The. Each time you want to find Abbey Road, which artist do you navigate through?

Incorrect artist names are a common case in digital music libraries. Just the other day, in a really bad case, I noticed that one of my double CDs, Two Pages, has its artist set to be (correctly) 4hero for the first CD and (incorrectly) 4 Hero for the second. Annoyingly, both artist spellings show up as separate artists when I browse through my artists.

Errant artist names can be categorised in different ways, and to fix them requires a little manual research and some tagging. The good news is I'm adding an (initially) semi-automated rule to bliss to recognise these artists and suggest alternative spellings! But more of that later.

Sources of confusion

There are two main ways diverging artist names can be introduced into your collection. The first is if you added the names yourself. Over time you may have added multiple recordings by the same artist and, forgetting your previous additions, may have named the artist differently.

That said, it may not be you providing the names. The second way incorrect artist names can crop up is if, when adding new music that is already tagged, that music has artist names inconsistent with your other music. If you are adding ripped CD music, the information from FreeDB may be incorrect. If you are adding music you have purchased online, maybe the online music store operates different consistency schemes.

What are the different ways in which artist names can vary? I can think of three:

  1. Pseudonyms and aliases
  2. Bad spelling
  3. Canonicalisation

I already dealt with pseudonyms and aliases, so the remainder of this post will consider the final two ways.

Bad spelling

This is normally pretty easy to spot. A glaring error in the spelling of an artist name normally stands out from the other artists.

To fix, first some research. The correct name can normally be found by searching for the artist name on Google. If it's a common misspelling then Google will suggest the correct spelling for you. For instance... The Beaples.

Once you've found the correct name, use a music tagger or bliss to change the name.

Canonicalisation

These types of error are more subtle than simple spelling errors. I would consider the 4hero example I gave above an example. The canonical name is '4hero' and '4 Hero' is therefore incorrect; however it is not incorrectly spelt as such.

Sometimes artists themselves make this difficult by changing their own name subtly. After the 4hero incident and researching the area I found this discussion on the Belle and Sebastian Wikipedia page. The band appear to have switched between using ampersands and 'and' as the separator in their name throughout their career. This doesn't help owners of their music, though, because it's not really practical to consider them as different artists when browsing your own music.

Fortunately, sources exist which consider the canonicalisation of artist names fairly seriously. MusicBrainz understands the different artist name permutations that can crop up. By using sources like MusicBrainz, you can discover the canonical name for a given artist and use that when you tag your music files as described above.

Considering your collection

The spelling of artist names in your collection seems fairly clear-cut. Fix it and move on.

Canonicalisation is a different matter. While sources exist, that doesn't necessarily mean that you agree with them, they cover all of the artists in your collection or that they are correct in 100% of cases. Furthermore, there may be technical reasons that you require an artist name to be in a particular form.

This means you might have to operate what I think of as a 'music data pipeline'. In some cases this may mean accepting the data from MusicBrainz et al, but then customise it in a deterministic way to end with the data you require. In other cases it may mean certain artists are set to a specific name.

Artist standardisation in bliss

Following requests from some CD ripping companies I've spent a few days on a new rule to identify and suggest fixes for unrecognised artists.

This will take the form of a simple checkbox to enable/disable artist standardisation. When enabled, bliss checks each artist in your collection for whether the artist name is a known alias on MusicBrainz. If it is an alias, and it is not the accepted canonical name, bliss suggests the canonical name as a fix. Here's an example:

This worked with 4hero quite nicely. There are some further improvements to make but I think this will be a useful rule. I'm hoping I'll release this in a week's time, so stay tuned!

Thanks to squarepants2004j/auntyhuia for the image above.
tags: tagging artists
blog comments powered by Disqus

The Music Library Management blog

Dan Gravell

I'm Dan, the founder and programmer of bliss. I write bliss to solve my own problems with my digital music collection.