The 3 Cs: Correctness

Onto the second topic in our blog series on the 3 Cs: correctness.

This is the second in a series of blog posts about the 3 Cs: consistency, correctness and completeness.

If there's one thing worse than inconsistent or incomplete metadata, it's incorrect metadata. Why? Well, other than your own annoyance at seeing incorrect information in your lovingly curated music library it makes your music harder to navigate and could potentially cause deeper problems. If you have incorrect metadata for the year, say, of an album's release, that means the album won't be displayed when you browse for that year.

Worse, if you ask software to perform tasks based on the incorrect metadata, the software may unknowingly make decisions based on that incorrect metadata. Play all albums with a YEAR tag of 2000 folder? Fine, but if a release has been incorrectly tagged for this year, you'll be listening to it!

There's a fine line between incorrectness and inconsistency. You could say, given a set of rules that define the correct syntax for metadata which you enforce to ensure consistency, music which does not pass these rules could be deemed "incorrect". That's fine, but it's not the meaning we are applying in this article, and it's covered by consistency anyway. In this article we are interested in objectively incorrect metadata.

That still leaves some questions unanswered as to what is covered. What about classification tags, such as genre, which are more subjective? What about canon, for example whether The Rolling Stones should include the The prefix? There's some leeway in our definition of correctness. It comes down to how you want your music library to appear.

However, there remains some metadata which can be called objectively incorrect, such as incorrect release years, incorrect track and album names (notwithstanding rules on formatting which you may apply), incorrect track numbering and more.

Solving the correctness dilemma

Great, so how can I tell if my music's metadata is incorrect? The obvious first way is to "eyeball" it yourself. This is pretty tedious and, because it's tedious, error prone.

Most likely you will notice odd bits of incorrect music data as you scroll through your library. For fixing odd data you notice, an always-on music server organiser like bliss is ideal. If you notice a piece of metadata is incorrect, simply go to bliss, which is always running on your home music network, and edit it.

That leaves two outstanding questions: how do I know what data I should replace the current data with, and what if I want to be more proactive? What about my unknown unknowns?

You can find the metadata you require by looking online for it. The sources I use are:

  1. All Music
  2. MusicBrainz
  3. Wikipedia
  4. Discogs
  5. [Various online music stores]

I normally use them in that order; the quality of data can sometimes vary. MusicBrainz, Wikipedia and Discogs are all community, crowd sourced data sets which makes them quite broad but not always totally accurate. A bit of interpretation can sometimes be required, but almost always the data is usable in some form.

From there, you need to use some sort of music tagger to take the metadata and insert it into your music files. As above, bliss can do this, although unless your files are untagged, you have to enter the tags manually. I'd love to add more automation for correcting incorrect tags, so please suggest it here.

One semi-automated approach to this is to use MusicBrainz Picard.

Getting ahead - using Picard to identify and fix incorrect metadata

MusicBrainz Picard is a free piece of software that incorporates audio fingerprinting to identify music. Once identified, candidate metadata can be suggested given the set of musical releases that are matched.

First, download and install Picard, then start it. You'll get an empty looking window:

Picard's start screen

Either drag-and-drop a folder into the window, or click Add Folder or Add Files to populate Picard:

Picard after adding folders with lots of unmatched files

Here I have added a group of folders into Picard. They have all been added to the Unmatched Files node. I can then press Cluster to group these files by their current tags.

Clustering the albums

That looks a bit better - some of my albums have been clustered together. This is important to encourage Picard to look for metadata related to these releases. If I don't do this, Picard will look for metadata related to the individual tracks. As many tracks appear in many releases (for example, greatest hits and other compilations) this would offer too much metadata to comprehend.

If I click one of the albums then click Lookup, Picard looks up the metadata and identifies incorrect metadata:

Looked up the metadata

After looking up the metadata we can see where our metadata is incorrect. The Title list shows how accurate the fingerprinting process was, to give us a feeling for how accurate the metadata will be. Here it's green for each track, meaning an accurate metadata lookup.

In more detail, when clicking on each track, the comparison between the current metadata and the suggested metadata from MusicBrainz is displayed at the bottom of the window. The differences are in green. You can see in this example we have incorrect metadata - the year is tagged as 1998 when it should be 1999. Simply clicking the Save button saves these changes to the file.

Here's a different example, The Boy With The Arab Strap:

Looked up the metadata for The Boy With The Arab Strap

You can see two tracks have not been identified. Clicking on each track that has been identified shows a potential problem: Picard is insisting on changing the album name to The Boy With the Arab Strap to fit in with MusicBrainz's capitalisation rules. If you click Save now, the album names will become inconsistent.

This is a drawback with Picard, and one you need to be mindful of. If your album names or other metadata become inconsistent it may break albums apart. It's not a tool to click merrily away on.

So that's Picard: a powerful tool to identify incorrect metadata and correct it, when used judiciously.

Next in this series, completeness!

Thanks to The U.S. Army for the image above.
tags: music databases tagger picard fingerprinting
blog comments powered by Disqus

The Music Library Management blog

Dan Gravell

I'm Dan, the founder and programmer of bliss. I write bliss to solve my own problems with my digital music collection.