I encountered a weird problem where I was using HTML Purifier together with CKEditor for one of my projects with a particular client. The problem was that every time I saved a page using CKEditor and consequently after the HTML Purifer was ran on-save, the non-breaking space character (nbsp) would additionally display a weird character (Â) on the page.
However, when I go back and edit the page using CKEditor in the management console, all displays properly and without the weird Â character. The database also has no sign of this character; inspected using PHPMyAdmin.
The reason and the fix however is relatively simple, even though it sounds like an annoying problem. The Unicode non-breaking space character is encoded in UTF-8 as 0xC2 0xA0, which in ISO-8859-1 translates to (Â ); a weird foreign character followed by a non-breaking space. Therefore, the reason behind the problem is that somewhere along the website protocols and infrastructure the encoded UTF-8 non-breaking space character is being translated into ISO-8859-1 instead of the intended UTF-8, hence the weird character being displayed.
- Set your webpage encoding;
e.g.: <meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″>
- Set database connection; e.g.: SET NAMES ‘utf-8′
- Specify utf8_unicode_ci (or similar) collation on all tables and text columns in your database. This makes MySQL physically store and retrieve values natively in UTF-8.
In my case, the problem occurred as my meta content-type tag was set to ISO-8859-1 instead of the intended UTF-8 and therefore the browser was interpreting the non-breaking space character as the weird foreign character above. I surely hope this post helps someone who has experienced a similar problem!