I'm trying to do a bunch translating of html encoded text into utf-8 to put it into my database. There are a ton of characters that get missed with both html_entity_decode, or iconv with Translit.
I've written up a long list of characters to strip out, but now I see that &Yuml is not translated, but ÿ is.
I'm sure there are other similar symbols that are missed as well.
Any advice on how best to handle these inconsistencies? and make sure I'm getting each character translated correctly?