Computer audio files: structural imbalances
The blog is going techno-geek today. I wanted to write a post about computer audio file formats, because music files have more complexity than first meets the eye.
Frankly, there aren't a whole load of reasons why you should care about this, particularly, because software and hardware makers generally hide this complexity quite well. I just thought it would be interesting.
You might think that there's not much more to music file formats than the audio stream itself, the internal tags I often talk about and a file extension identifying the file format. Well, there is a more to it than that, because there's not just variation in how music files are structured, but also whether some formats can really be considered 'structured' at all.
Depending on the type of file, the music file may contain simply the audio itself, or be more rigorously demarked into separate audio and information portions in a structure known as a audio container. Most importantly to the user of a music file is that it is possible to hold different types of data. What are the types of data found in music files? Well, there's the audio itself, the metadata about the audio, and then, optionally, the container. I'll take them in reverse order and then discuss the enigmatic tendencies of the popular MP3 format.
The bit that matters, of course! The audio is stored and optionally encoded according to an algorithm. If encoded the audio must be decoded to raw audio before it can be played back. This is done by a codec which is just a piece of software (or, theoretically, hardware) that knows how to convert encoded data to the raw audio.
The codec determines the way the audio is stored. At its simplest, already decoded, are the PCM raw audio streams, as used in the WAV format and on CDs. However, more complex codecs impose further structure as to how the audio data is laid out. It is the codec itself that knows how to convert this structure back into the raw audio data. FLAC is another codec, as is MPEG-1 / MPEG-2 Audio Layer III (otherwise known as MP3)
Metadata identifies and provides information about the co-located audio. This is what allows you to navigate your music collection by song, album or artist and lots more besides, as I've written about at length before!
Metadata is stored in a tag, which is yet more bytes in an audio file. How it's stored, however, depends on the file format. In the case of a formalised audio container, the tag has a defined location. However, in the more... idiosyncratic formats (as we will see later) the tag does not have a defined location and is squirrelled away in a 'well known' yet not formalised location.
As well as their location, the tag data itself can be stored in differing ways. There are different tag formats. ID3 is probably best known, being used in MP3s and other file formats, but others include Vorbis comments and APE. Generally, but not always, the choice of tag format follows the file format. It is possible to mix and match, however. It's not rare to see APE tags in MP3s for instance.
An audio container may contain multiple audio streams and also alloted locations for other types of data such as images and tags. Importantly, these locations are described in a specification so that software makers know what to expect when they encounter some data in an audio container.
In the case of music files (remember music streams are another delivery method) this normally means one audio container per file. Examples of audio container formats are MP4 and FLAC (FLAC is both a codec and a container).
MP3: messy solutions
MP3 is the most popular file format of all, yet it does not benefit from a rigorously defined audio container. MP3 (its full name, as mentioned above, is MPEG-1 (or MPEG-2) Audio Layer III) itself is not an audio container, it is simply a codec. This raises the question, how are tags stored?
The answer is: with a crowbar. A de facto approach to storing tags developed amongst developers whereby the tags are stored in unused portions of the audio stream. It's some testament to the developers that this complexity is hidden from users.
Thanks to kevin dooley for the image above.