Space (nbsp;) Mis-behaving with HTML Purifier

Space (nbsp;) Mis-behaving with HTML Purifier

Space (nbsp;) Mis-behaving with HTML Purifier

I encountered a weird problem where I was using HTML Purifier together with CKEditor for one of my projects with a particular client. The problem was that every time I saved a page using CKEditor and consequently after the HTML Purifer was ran on-save, the non-breaking space character (nbsp) would additionally display a weird character (Â) on the page.

However, when I go back and edit the page using CKEditor in the management console, all displays properly and without the weird  character. The database also has no sign of this character; inspected using PHPMyAdmin.

The reason and the fix however is relatively simple, even though it sounds like an annoying problem. The Unicode non-breaking space character is encoded in UTF-8 as 0xC2 0xA0, which in ISO-8859-1 translates to (Â ); a weird foreign character followed by a non-breaking space. Therefore, the reason behind the problem is that somewhere along the website protocols and infrastructure the encoded UTF-8 non-breaking space character is being translated into ISO-8859-1 instead of the intended UTF-8, hence the weird character being displayed.

Possible Solutions
  • Set your webpage encoding;
    e.g.: <meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″> 
  • Set database connection; e.g.: SET NAMES ‘utf-8′
  • Specify utf8_unicode_ci (or similar) collation on all tables and text columns in your database. This makes MySQL physically store and retrieve values natively in UTF-8.

In my case, the problem occurred as my meta content-type tag was set to ISO-8859-1 instead of the intended UTF-8 and therefore the browser was interpreting the non-breaking space character as the weird foreign character above. I surely hope this post helps someone who has experienced a similar problem!

5 Responses to Space (nbsp;) Mis-behaving with HTML Purifier
  • Marc

    I have the same issue here! But on my side the ‘Â ‘-character is stored to mysql. The webpage encoding is already set to charset=utf-8
    My situation:
    Win2003 Server with Apache 2.4
    PHP is served by mod_fcgid
    Using joomla cms if i save an article with   characters this characters are saved as  in the database. collation in the database is set to utf8_unicode_ci.
    The issue has something to do with mod_fcgid. Before mod_fcgid I used php-cgi with exactly the same setup and it worked without issues.
    Any ideas?

    Regards, Marc

  • Joe Wu

    Hi Marc

    In theory it shouldn’t have anything to do with PHP modules, unless it is somehow intervening with the storage or display procedures of the website.
    Can you see this weird character in MySQL when going directly into something like PHPMyAdmin for example and having a look?

    Joe

  • Marc

    Hi Joe

    Yes I can see this weird character in PHPMyAdmin and also in db dump files. So it’s definitely written wrong to the database…

    Regards, Marc

  • Sasi kanth

    Thanks for the tips… very useful.

  • Kayser

    In my case the meta tag is already as utf-8 but the jsp page directive is as iso-8859-1. Continue to plumb error because the text poassado to request being converted to nbsp FROG # 160.
    When I change the @ Page directive to utf-8 can no longer bring the text back seat converted, it comes as is stored by ckeditor full of his own characters. What can I do to solve the problem?

Leave a comment

POST COMMENT