The Technical Gabbleygook and Why Emoji?
First, what is an emoji? In lay terms, they are emoticons on speed. Instead of typing
:-), you can “type” 😀 via a special keyboard on your phone/mobile or via the Special Characters item in the Edit menu on OS X.
:-) is three distinct characters. A :, a -, and a ). 😀 is just one, the smiley itself. Most characters that have been used online have consisted of no more than “three-byte characters”, previously, or all within “Plane 0” of the Unicode standard.
Okay, so what does that mean? In short, it takes one byte to record the letter “A”, two bytes to record a copyright sign (©), three bytes to record that an em-dash is an em-dash (—) in the Unicode standard.
This is great. WordPress set the default encoding in MySQL, the database used, to UTF-8 and called it a day. The problem, though, is UTF-8 in MySQL doesn’t actually support the entire UTF-8 standard. It is better described as utf8mb3: three-byte UTF-8 encoding.
MySQL eventually (in 5.5) added utf8mb4, four-byte UTF-8 encoding, which opens up a ton of doors. A fourth byte greatly expands the available characters.
But Kraft, what about emoji?
I’m getting there. I love emoji, but this fourth byte is more important for other reasons. Musical notation, like 𝄞 or 𝄢, are four-byte characters. Dominoes, like 🁡 or 🁫, are four-byte characters. Hell, Mahjong tiles, 🀁 and 🀙, are four-byte characters. Can it get more important than this?
Yes. Han characters are four-byte characters. These characters, 道 and 草, for example, are Chinese, Japanese, and Korean characters that have been standardized into a single character map. By enabling utf8mb4, WordPress can now natively handle these characters.
Oh, and emoji. Emoji are four-byte characters. 👍
Awesome! I can use emoji once I’m using WordPress 4.2‽
Probably. WordPress supports MySQL 5 or higher. utf8mb4 is supported in MySQL 5.5 or higher. WordPress is smart enough not to upgrade your database to something it can’t handle. WordPress will work around this by encoding emoji in a way the older database standard will accept. If you’re in this group, you really should upgrade to a newer version of MySQL. Contact your host!
Will emoji display everywhere?
Yes and no. Native emoji support is operating system- and browser-dependent. Gary Pendergast, known in many places as Pento, spearheaded adding universal emoji support to WordPress core, based on the work that Mo Jangda, Marcus Kazmierczak, and Joen Asmussen did to bring emoji to WordPress.com (which was skinning a different cat than core since WP.com uses the latin1 character set… that’s a whole different post).
In other words, WordPress lets your computer do it’s thing when it can and helps out when it can’t.
Yes! Add emoji to your posts. Make emoji categories! Do it all!
Word of warning: While WP will use emoji in slugs, the “ID” of a post or category used in pretty permalinks, you might see some odd experiences. Emoji within URLs is supported. Most top-level domains, like .com or .net, reject non-ASCII characters (with notable exceptions), but in the remainder of the URL is fine.
My new 🍺 tag URL works great: http://brandonkraft.com/b/t/🍺
If you copy and paste that into, say, Twitter, it’ll convert it to the non-emoji representation: https://kraft.blog/t/%F0%9F%8D%BA/ . This is exactly the same as the original URL. If you visit that link in a browser/OS that supports emoji, it’ll display it with the 🍺 emoji. A++ 👍 Ship it! 🚢
If you, though, type http://brandonkraft.com/b/t/🍺 directly into Twitter, Twitter won’t see 🍺 as part of the URL, giving you an incomplete URL of http://brandonkraft.com/b/t/. 😕
Slack will convert 🍺 to
:beer:, giving you a completely different URL of https://kraft.blog/t/:beer:/ . In my case, it still actually works since I tag anything with 🍺 with
beer too, and WordPress’ permalink sniffing will connect the dots since it’s close enough.
Emoji are a joke in plenty of circles, but are a growing part of “Internet speech”. By supporting the hotness, we’ve also greatly improved the international community’s ability to use WordPress in their native language.
It took a great deal of work on levels that I don’t understand—changing the character set of the database without impacting anything else in a mixed environment of MySQL versions with varied support, while providing a consistent experience across all environments for emoji.
To read through how the sausage was made, #21212-core handled the database schema changes and #31242-core for the emoji rendering support.
Leave a Reply