Writing a find-and-replace rule to fix HTML encoding

You can write a generic "find and replace" rule in bliss using custom rules.

In this example I'll take this idea to remove HTML encoding from tags and implement it as a custom rule. HTML encoding is a way of writing certain characters such as < or & in a way that does not affect the rendering of Web pages. The respective special characters for those two examples are &lt; and &amp;.

Unfortunately this text can sometimes be copied outside of an HTML context which leads to gibberish that makes little sense. In our case, it can sometimes make its way into our music tags (maybe caused by copy-and-paste gone bad, or software using data which has been HTML encoded). This then might mean the same gibberish showing up in your music player!

Custom regex rules are basically ways of finding and replacing text in your music tags. Think of them as "find and replace rules". Starting a find and replace rule can be a case of just copying and pasting the following template:

We need to replace everything in square brackets. But before you do that, I'll fill in some of the brackets...

For finding and replacing HTML encoding, we can make a good start:

# Find HTML encoding and replace with the decoded value
rule find_replace_html_encoding with label "Find and replace HTML encoding" has alternatives
    HTML_DECODED {
        find /[find regex, may include groups]/ replace with "[Substitution string, may include group references]"
    }
applies to album_artist, album_name, track_name, artist, comment

So that leaves the find line. What should we put here? It turns out there are a few lists of HTML special characters available on a web. Here's a good one!

Using those, we can add multiple find-and-replace lines:

find /"/ replace with "\""
find /#/ replace with "#"
find /$/ replace with "$"

This doesn't scale particularly well; there are a lot of lines in the linked table! However we can pretty easily implement all the common ones.

Here's the completed rule:

Now, to use it:

  1. Download the above completed script and save it as find-and-replace_html-encoding.regexrule
  2. Move the file to the bliss settings folder, into the regex-rules folder. The bliss settings folder is in:
  3. Windows XP C:\Documents and Settings\[username]\.bliss
    Windows Vista, 7, 8 and 10 C:\Users\[username]\.bliss
    Mac OS X (see below) /Users/[username]/Library/Preferences/bliss
    Linux /home/[username]/.bliss
    VortexBox /root/.bliss
    Synology /var/packages/bliss/target/var/.bliss
    QNAP `getcfg SHARE_DEF defVolMP -f /etc/config/def_share.info`/.qpkg/bliss/.bliss
  4. Restart bliss.
  5. Visit the settings page, and your rule should be there.
adding html rule to settings

Once you've added the rule, click Apply rules and any special characters should be found:

adding html rule to settings

Click the one-click fix or use the Inbox to click en-masse.

Let me know if you think of further good find-and-replace rules!

Thanks to unsplash-logorawpixel for the image above.
tags: regex
blog comments powered by Disqus

The Music Library Management blog

Dan Gravell

I'm Dan, the founder and programmer of bliss. I write bliss to solve my own problems with my digital music collection.