html_entity_decode characters like &Yuml vs &yuml | ansaurus

tags:

views:

55

answers:

1

Q:

html_entity_decode characters like &Yuml vs &yuml

I'm trying to do a bunch translating of html encoded text into utf-8 to put it into my database. There are a ton of characters that get missed with both html_entity_decode, or iconv with Translit.

I've written up a long list of characters to strip out, but now I see that &Yuml is not translated, but &yuml is.

I'm sure there are other similar symbols that are missed as well.

Any advice on how best to handle these inconsistencies? and make sure I'm getting each character translated correctly?

+1 A:

Anything that is in the form &blah; is an entity reference in (X)HTML; if you need to be sure you got them all, make sure none of your final UTF-8 output contains that pattern. You'll also find plenty without the semicolon at the end (but many false positives there).

Wikipedia, naturally, has a list of HTML/XHTML/XML entity codes. You can implement that (long) list, and see if you find any additional ones in the wild.

derobert 2009-08-19 06:45:43

thanks derobert,I was hoping there was a way to do this without going through such a long list (hoping something already existed). Looks like I'll be making the cleaning for that and I'll post the function for those who need it in the future.

pedalpete 2009-08-19 18:25:07

related questions

IDE suggestions: Eclipse IDE vs. Zend Studio ( confused )

MySQL/Apache Error in PHP MySQL query

Lightweight IDE for Linux

What PHP framework would you choose for a new application and why?

Why is my ternary expression not working?

How can I get at the matches when using preg_replace in PHP?

Mechanisms for tracking DB schema changes

Wordpress theme development offline tools

Using object property as default for method property

How can I get the authenticated user name under Apache using plain HTTP authentication and PHP?

Make XAMPP/Apache serve file outside of htdocs

How do you debug PHP scripts?

PHP Variables passed by value or by reference?

Best way to implement unit testing in PHP

Connect PHP to an AS/400

Best way to access Exchange using PHP?

PHP Session Security

How do I access a remote form in php?

What's the best way to generate a tag cloud from an array? (using h1 through h6 for sizing)

Apache/PHP: error_log per Virtual Host?

How do I track file downloads with apache/PHP

How would you access Object properties from within an object method?

Flat File Databases in PHP

Best way to allow plugins for a PHP application

Latest information on PHP upcoming releases