I am reading the documentation for creating a podcast feed suitable for iTunes, and the Common Mistakes section says:
Using HTML Named Character Entities.
<! — illegal xml — >
<copyright>© 2005 John Doe</copyright>
<! — valid xml — >
<copyright>© 2005 John Doe</copyright>
Unlike HTML, XML supports only five "named character entities":
character name xml
& ampersand &
< less-than sign <
> greater-than sign >
’ apostrophe '
" quotation "
The five characters above are the only characters that require escaping in XML. All other characters can be entered directly in an editor that supports UTF-8. You can also use numeric character references that specify the Unicode for the character, for example:
character name xml
© copyright sign ©
℗ sound recording copyright ℗
™ trade mark sign ™
For further reference see XML Character and EntityReferences.
Right now I'm using htmlentities()
under PHP5 and the feed is validating and working. But from what I gather some things that could get put into content might become entities that would make it no longer be valid. What's the best function to use to assure I'm not passing along bad data? I'm paranoid something will get entered and get entity-ized and break the feed -- should I just use str_replace()
and replace with named entities and leave the rest alone? Or can I use htmlspecialchars()
somehow?
So in short, what's a drop-in replacement for htmentities()
that will make sure input is safe for description, titles, etc in a podcast RSS feed?