Image displaying various emoji.

Emoji, WordPress, and You

Starting in WordPress 4.2 (currently in beta, scheduled for release next month), native emoji support will be available to most WordPress installs. 

The Technical Gabbleygook and Why Emoji?

First, what is an emoji? In lay terms, they are emoticons on speed. Instead of typing :-), you can “type” 😀 via a special keyboard on your phone/mobile or via the Special Characters item in the Edit menu on OS X.

:-) is three distinct characters. A :, a -, and a ). 😀 is just one, the smiley itself. Most characters that have been used online have consisted of no more than “three-byte characters”, previously, or all within “Plane 0” of the Unicode standard.

Okay, so what does that mean? In short, it takes one byte to record the letter “A”, two bytes to record a copyright sign (©), three bytes to record that an em-dash is an em-dash (—) in the Unicode standard.

This is great. WordPress set the default encoding in MySQL, the database used, to UTF-8 and called it a day. The problem, though, is UTF-8 in MySQL doesn’t actually support the entire UTF-8 standard. It is better described as utf8mb3: three-byte UTF-8 encoding.

MySQL eventually (in 5.5) added utf8mb4, four-byte UTF-8 encoding, which opens up a ton of doors. A fourth byte greatly expands the available characters.

But Kraft, what about emoji?

I’m getting there. I love emoji, but this fourth byte is more important for other reasons. Musical notation, like 𝄞 or 𝄢, are four-byte characters. Dominoes, like 🁡 or 🁫, are four-byte characters. Hell, Mahjong tiles, 🀁 and 🀙, are four-byte characters. Can it get more important than this?

Yes. Han characters are four-byte characters. These characters, 道 and 草, for example, are Chinese, Japanese, and Korean characters that have been standardized into a single character map. By enabling utf8mb4, WordPress can now natively handle these characters.

Oh, and emoji. Emoji are four-byte characters. 👍Emoji!

Awesome! I can use emoji once I’m using WordPress 4.2‽

Probably. WordPress supports MySQL 5 or higher. utf8mb4 is supported in MySQL 5.5 or higher. WordPress is smart enough not to upgrade your database to something it can’t handle. WordPress will work around this by encoding emoji in a way the older database standard will accept. If you’re in this group, you really should upgrade to a newer version of MySQL. Contact your host!

Will emoji display everywhere?

Yes and no. Native emoji support is operating system- and browser-dependent. Gary Pendergast, known in many places as Pento, spearheaded adding universal emoji support to WordPress core, based on the work that Mo Jangda, Marcus Kazmierczak, and Joen Asmussen did to bring emoji to WordPress.com (which was skinning a different cat than core since WP.com uses the latin1 character set… that’s a whole different post).

Gary’s feature plugin, x1f4a9 (translates as 💩), added code that checks to see if your browser/operating system supports emoji natively. If not, it uses Javascript magic and Twitter’s open-sourced emoji graphic representation, to replace the “broken” emoji with an image of the emoji. This works in /wp-admin/ (when writing a post, for example) as well on the front end when a visitor reads your post.

In other words, WordPress lets your computer do it’s thing when it can and helps out when it can’t.

EMOJI EVERYTHING!

Yes! Add emoji to your posts. Make emoji categories! Do it all!

Word of warning: While WP will use emoji in slugs, the “ID” of a post or category used in pretty permalinks, you might see some odd experiences. Emoji within URLs is supported. Most top-level domains, like .com or .net, reject non-ASCII characters (with notable exceptions), but in the remainder of the URL is fine.

My new 🍺 tag URL works great: http://brandonkraft.com/b/t/🍺

If you copy and paste that into, say, Twitter, it’ll convert it to the non-emoji representation: https://kraft.blog/t/%F0%9F%8D%BA/ . This is exactly the same as the original URL. If you visit that link in a browser/OS that supports emoji, it’ll display it with the 🍺 emoji. A++ 👍 Ship it! 🚢

If you, though, type http://brandonkraft.com/b/t/🍺 directly into Twitter, Twitter won’t see 🍺 as part of the URL, giving you an incomplete URL of http://brandonkraft.com/b/t/. 😕

Slack will convert 🍺 to :beer:, giving you a completely different URL of https://kraft.blog/t/:beer:/ . In my case, it still actually works since I tag anything with 🍺 with beer too, and WordPress’ permalink sniffing will connect the dots since it’s close enough.

In conclusion…

Emoji are a joke in plenty of circles, but are a growing part of “Internet speech”. By supporting the hotness, we’ve also greatly improved the international community’s ability to use WordPress in their native language.

It took a great deal of work on levels that I don’t understand—changing the character set of the database without impacting anything else in a mixed environment of MySQL versions with varied support, while providing a consistent experience across all environments for emoji.

To read through how the sausage was made, #21212-core handled the database schema changes and #31242-core for the emoji rendering support.

Comments

6 responses

  1. Lindsay Avatar

    I’m not quite sold on emoji yet. I don’t even use them for texting or social media, although I did love Salon’s “classic first lines in emoji” article.

    Your post was garbled almost beyond recognition as I was reading it in Feedly. The layouts don’t translate well, I guess. (Markdown footnotes don’t, either.) For me, the post still has those square boxes full of numbers for the music notes and such even as I leave this comment on brandonkraft.com. All the regular emoji show up here, though.

    I’m okay with getting on the bandwagon late. I was only recently sold on GTD!

    1. Brandon Kraft Avatar

      I think it is an operating system thing for the other four-byte encoded characters. On OS X, they appear fine for me and Vanessa. On my Android phone, they’re boxes. I’m going to guess they probably don’t display on Windows either.

      I suppose the next challenge is getting other operating systems to see the beauty of all four-byte characters and not only support emoji and Han 🙂

      1. marsjaninzmarsa Avatar

        Emoji needs at least Andoird 4.1 to display natively (and 4.4 to display in color). Support for monochromatic Emoji was added in Win 7 SP1, and for color in 8.1. Most of Linux distros don’t have out of the box Emoji support, you need to install Symbola font. <a href="http://google-opensource.blogspot.com/2013/05/open-standard-color-font-fun-for.html"Color Emoji support is on the go.

  2. […] adding a ton of new emoticons. However, the technical aspects involved are complex. Brandon Kraft has a great article that explains what emoji is, why it’s necessary, and why you should […]

  3. […] Andy Nacin gave this talk today at LoopConf in Vegas. In one sense, it is in the same vein as my Emoji, WordPress, and You post. […]

  4. tldr:

    Convert your DB columns to utf8mb4 to support emojis / special characters everywhere in WordPress
    Tweet

    tldr2; Or do a urlencode when saving, and urldecode when getting the post meta.
    I recently did some client work that involved allowing users to generate their own image (canvas) that contained their name and a bunch of other decorative elements.
    We allowed the users to change their name to customize the dynamically populated canvas. The name change was saved every edit via ajax as post meta so that their edits would become permanent.
    Easy enough. I coded it, tested it and the new feature went live.
    But when people started using it, literally within 5 minutes, I didn’t anticipate that people would decorate their names with emojis.
    Some looked like this:
    💛💛 Benjamin 💛💛
    It was fun, but a bug had emerged. People who were adding emojis to their names weren’t saving in the ajax call.
    update_post_meta Was Failing
    After digging around, I found the code that wasn’t working as expected:
    $text_with_emoji = sanitize_text_field( ‘happy 😀’ );
    update_post_meta( 1, ‘foo’, $text_with_emoji );

    var_dump( get_post_meta( 1, ‘foo’, true ) );
    // Will return null!
    The text was being sanitized properly, but update_post_meta wasn’t saving if it had an emoji in it!
    It didn’t remove the emoji, it didn’t sanitize the emoji for saving, it just.. failed.
    WordPress 4.2 – Support for Emojis
    I distinctively remember the release of version 4.2 wherein Emoji support was implemented. So I just assumed it worked. Here’s a highlight from the release:

    Emoji are now available in WordPress! Get creative and decorate your content with 💙, 🐸, 🐒, 🍕, and all the many other emoji.

    If you try and put an emoji in the title or content, they work fine.. even if you use wp_update_post.
    But if you add an emoji in a custom field, or add it via update_post_meta, it doesn’t.
    So what gives?
    Server-Side Fix: utf8mb4
    After some more research, I found out that emojis are 4-byte characters and that for WordPress to support emojis and other special characters such as Japanese everywhere, the database should also support 4-byte characters.
    So I checked my MySQL database columns and they were utf8_general_ci, which means they won’t support 4-byte characters.
    The quick fix is to enable utf8mb4 by converting your column to utf8mb4_general_ci.
    BAM! Emojis in post meta now work!
    But…
    However, it only works in my site. I had control over my MySQL database, so there’s no problem.
    But if I’m creating a plugin that a lot of people will use, it should work without the user having to modify their database table structure and risk having their data turned into gibberish (this was a warning PHPMyAdmin gave me when I was converting the column format, scary).
    There are web hosts that just provide utf8 MySQL tables. Forcing everyone to convert their database tables is a hassle.
    Title & Content vs Custom Fields
    So now the question is: If all my database columns are utf8, why do emojis work in titles and post content, but not in custom fields / post meta?
    Well the answer is that provisions were added to the title and content to safely save and get 4-byte characters. And post meta doesn’t have those provisions.
    Workaroud for utf8
    To make emojis work, we’ll have to add some provisions to make them work ourselves.
    So here’s the sanitized text, which clearly shows 4 bytes.
    var_dump( sanitize_text_field( ‘happy 😀’ ) );
    // Gives happy 😀 (4 bytes)
    Since WordPress fails when saving this, we can just encode it into an manageable format:
    var_dump( urlencode( sanitize_text_field( ‘happy 😀’ ) ) );
    // Gives happy+%F0%9F%98%80 (4 bytes encoded)
    We save that encoded string and we just have to make sure that we decoded it back upon getting:
    $text = urlencode( sanitize_text_field( ‘happy 😀’ ) );
    update_post_meta( 1, ‘foo’, $text );

    var_dump( urldecode( get_post_meta( 1, ‘foo’, true ) ) );
    // Gives “happy 😀”
    Now you can save emojis in areas in WordPress that doesn’t support them without having to convert database table formats.

Leave a Reply