UTF8 in PHP and MySQL


Can your code handle that? I found a lot of great info on the PHP WACT site for making your code and database work well with complex character sets. This is important even if your intended audience is North American english speaking only. MS Word uses smart quotes which are UTF8 characters and will cause havoc on your site.

So what’s the tl;dr of the PHP WACT site?

  1. All MySQL tables and columns set to utf8-general-ci (or other sorting of your choice).
  2. Ensure your connection to the database is UTF8 with SET NAMES 'utf8';
  3. Send a header to declare your page is UTF8. This also ensures POST content is sent to you in UTF8. The browser will help convert for you. header('Content-Type: text/html; charset=utf-8');
  4. Use htmlspecialentities() for making user submitted or untrusted text safe to display in HTML or XML. It will do the bare minimum and nothing more. htmlspecialchars($utf8_string, ENT_COMPAT, 'UTF-8');

Max Woolf has a GitHub repo of Naughty Strings you can use in testing your code to make sure it supports everything.

Bonus reading: Emoji and MySQL use utf8mb4

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s