UTF8 Sanitize WordPress Plugin Homepage

This plugin is the second one from the series of correctional plugins after the Broken Images WordPress Plugin. It is designed to “fight” the broken UTF8 characters on your WordPress powered site. If do not understand what I am talking about, here is an example – you were inspecting to see something like this:

UTF8 Sanitize: Original UTF8 Characters

but after a while (usually after copying and pasting, and editing in general, well not always, but sometimes) it gets like:

UTF8 Sanitize: Broken UTF8 Characters

I’ve seen a lot of these errors these days, and recently on blogs that I like to read. Sometimes on the posts of my blog I need to quote these blocks, and pasting such broken code seems unacceptable. That’s how this plugin was born ;)

Install the UTF8 Sanitize WordPress plugin and you will be able to fight those ugly characters. Additionally, you can customize what actions you want to perform on them – either convert them to their original values once and for all by modifing your posts’ database, OR leave all your posts intact and instead convert the ugly chars each time a post is loaded (all these options are controlled from the WordPress admin panel).

Download

Installation

  1. Download the installation archive and uncompress
  2. Upload the unarchived folder to your plugins directory (this is wp-content/plugins/)
  3. Activate the plugin in your WordPress administration panel (this is the Plugins page)
  4. Configure the plugin from the Options page in the WordPress administration panel
  5. That’s it ;)

Usage

Once you have the wp-broken-images plugin installed, you have two options for fighting the broken UTF8 characters:

  • Correct the chars when outputting the posts:
    Cook the posts output to convert the broken UTF8 characters into their correct values.
    This option will not change the contens of your posts.

  • Correct the chars when editing/saving the posts:
    Cook the posts before saving them to the databse by converting the broken UTF8 characters into their correct values.
    This option will change the contens of your posts.

Coming Soon

Here’s the list of future features:

  • offer a way to add custom fixes, something like original: xxxx, mutation: xxxx
  • add optional fixes based on different char-sets like Cyrillic, Norwegian, etc.
  • add a cron-based feature to crawl all the blog’s posts and comments, and fix all broken UTF8 chars (this is going to be useful for large blogs with already a lot of messed up UTF8 chars)

Updates

  • (October 15, 2007) A new release – version 0.5 is out!

    This new release features several more mutations (broken characters) added to the watch-list of the plugin.

  • (October 03, 2007) A new release – version 0.4 is out!

    This release features several more mutations (broken characters) added to the watch-list of the plugin.

  • (September 28, 2007) A new release – version 0.3 is out!

    This new release features several more mutations (broken characters) added to the watch-list of the plugin.

  • (September 27, 2007) A new release – version 0.2 is out!

    This release features several more mutations (broken characters) added to the watch-list of the plugin.

  • (September 20 2007) Bogomil’s idea – let’s give him a friendly push!

    One of the coolest guys in Bulgarian Open-Source Community, Programming Community, Nerd Community, Geek Community, IT Community, etc. – Bogomil Shopov – decided to win a SEO contest and give the prize for charity. Let’s give him a hand by placing this little link here: Яицазимитпо лъгуг

  • (September 20, 2007) The page for the plugin si done!

    The wonderful page you are reading right now to help you
    deal with this plugin is out.

  • (September 18, 2007) The first release – version 0.1 is out!

    The first release of this plugin is out, hosted on Google Code.

Support

Submit all your comments and bitter remarks in the comments form below. Good luck with the UTF8 Sanitize WordPress Plugin! If you think you can handle Google Code, go and submit the bugs you’ve found (if any), and the ideas you have about improving this WordPress plugin at the Issue Tracker:

Or, use the feedback form for this plugin at WordPress.org:

70 мнения за “UTF8 Sanitize WordPress Plugin Homepage”

  1. Wow, I run into this issue all the time when dealing with an install that’s had it’s database exported and imported. Thanks so much for writing a plugin to handle this! :D

  2. Hi, I am a newbie and want to make sure I am doing this right. A recent upgrade to 2.2.1 via Fantastico gave me bad characters throughout my archives. All old posts are affected. Should I choose option 1 or 2 to fix all the old posts?
    I think it is #2, but want to be sure.
    Thanks!!!
    nobugs

    1 Correct the chars when outputting the posts:
    Cook the posts output to convert the broken UTF8 characters into their correct values.
    This option will not change the contens of your posts.

    2 Correct the chars when editing/saving the posts:
    Cook the posts before saving them to the databse by converting the broken UTF8 characters into their correct values.
    This option will change the contens of your posts.

  3. @nobugs: Use both. The first one will make it so that the broken chars are never shown to people visiting your blog, while the second one will help you not to enter new broken chars anymore.

  4. Thanks Kaloyan.
    Sorry I am not very experienced with these things, and I really appreciate your advice.
    I did it and I think it removed most of the bad characters.
    However, some of my posts, in Spanish, still show some broken characters. They’re linked from this page:
    http://bedbugger.com/espanol/
    I think I need to wait for your custom fixes, but is there anything else i can do, besides go through and delete them?
    Thanks again!

  5. very nice plugin, but could you please tell me how I can modify it so that when activated, both checkboxes are already checked? I want to use it in a wpmu environment, where it getss autoactivated so I dod not want the users to play with it, it just should be activated and both checkboxes checked.

  6. @ovidiu: Modify the settings for the plugin. They are stored in wp_options as `wp_utf8_sanitize_settings` and their default value is:

    a:2:{s:5:"write";i:0;s:6:"output";i:1;}

    This means that the „write“ is OFF, and the „output“ is ON. To make both ON use this value:

    a:2:{s:5:"write";i:1;s:6:"output";i:1;}
  7. Да върху моя тествах. UTF8 e. „B“ по специално като първа главна буква в изречение, ако това изобщо има значение.

  8. I have the plugin installed but it doesn’t touch the A with a carat sign above it like appears in this post.

    http://www.phillysonline.com/WordPress/2006/10/25/a-bulava-is-not-for-dunking-in-coffee/

    I have the plugin set to work on output and on saving. So I opened the post in the editor and saved it, but the post still has those crazy characters in it.

    To correct it I’d have to edit the post and manually delete the chars then save it. Is the plugin not working right or is it only written to eliminate that one example above?

  9. Hello there,

    I really wanted to express my gratitude. This plugin has been perhaps the most important one I have ever used. After a recent problem that occurred during a standard update of WordPress, all sorts of things started happening to my text. Thankfully this plugin sorted things out. Not only that, but it worked so quickly and with a single button press.

    Thank you for a wonderful plugin,
    Craig

  10. @Anders: are there any issues with the plugin and WordPress 2.3.3 ? I haven’t upgraded yet, but I read the post about the new 2.3.3 version, and it is a security fix release. The modifications are related to XML-RPC, and some other minor stuff, as well as some issues w/ the WP-Forum plugin. There’s nothing that about changes in the plugin „framework“ or the post-related plugin hooks.

  11. My entire blog has been written for 5 years in iso-8859-1, I wanted to know if this plugin can help me if I switch to UTF8 so I don’t have to start editing post-by-post all the errors. I know it might be a stupid question, but I wanted to be sure before I broke the entire site :p I’ll be waiting for your reply. Thanks in advance!

  12. @Fer: The best’s to test it ;) Create some demo installation, dump the database and then import it in the demo – you’ll see what potential issue might come up.

  13. hi there.
    I have an issue and do not know if this plugin might help me or not.
    I am taking care of some friends blog, charset is and always was utf8, but after the latest upgrade, where somethign went not so perfect, I had to reimport the backed up database, and now I see ugly characters. I do not know if its UTF8 related, but here is the problem: they write in German and all their umlauts (see here for details: http://en.wikipedia.org/wiki/Umlaut_%28diacritic%29 ) are broken, i.e. instead of für it is now für – could you modify your plugin to also take care of these kind of problems?

  14. @ovidiu: You can try the plugin with the „Correct the chars when outputting the posts“ option and see what the result will be.

  15. Hello! I’m running WP 2.5.1, with charset ISO-8859-1. I’d like to run UTF-8 instead, but when I set WP to use UTF-8 all my norwegian characters turns into the same strange character, looking like a „traffic sign“. Will your plugin be able to fix this, and does it work with my version of WP?

  16. Hello! Thanks very much for this plugin. I am having on small problem though. I keep getting a  at the start of my posts title and the plugin seems to be ignoring it. Is there some code that I can put into the plugin to make the  character dissapear please? It’s driving me NUTS…

    Thanks again… :)

  17. Any one please help me out. I m working on a Hungarian based blog, and its an autoblog.
    That plugin is not working for me, even i did not see process posts tab in the plugin admin. Please help me out. That is the blog. http://hitel.name/

    Regards !

Коментари са забранени.