views:

38

answers:

2

I really need to get WordPress to "sanitize" them and replace them with S, s, T, t when it creates post slugs.

I know the code for this is in wp-includes/formatting.php and it's the "remove_accents" function, but I can't for the life of me figure how these letters decompose into chr(int).chr(int), and I would really love to find out since I'd like to have these diacritics included in that list.

+1  A: 

Assuming you are using utf-8:

Ș -> \xc8\x98 -> 200,152
ș -> \xc8\x99 -> 200,153
Ț -> \xc8\x9a -> 200,154
ț -> \xc8\x9b -> 200,155
fcurella
Thank you. Could you please tell me how you did that? ;;)
intlect
I opened a python terminal, and pasted a glyph into a string. Then echoed the string to see its hex values. Alternatively you can get those just looking up the glyph on an utf-8 char table like at http://www.utf8-chartable.de/unicode-utf8-table.pl?start=512. Then al you need is to convert those values to base10
fcurella
+5  A: 

I try to make it my business NOT to know or to care, and certainly not keep a table in my own code :)

echo iconv('utf-8','ascii//translit','Ș, ș, Ț, ț');
//Output: S, s, T, t
Wrikken
Thanks. I know this is how it's "supposed to be done", but that decomposition had been bugging me for a long time. Plus, for some strange reason I thought that WordPress had a reason(c) for keeping a table there, and that it would help getting this minor change accepted to do it like they did before, but Googling around I've found the reason to be a comment on the php.net page for I don't know which function, providing the very lines that seem to have become the remove_accents function.
intlect
Well, maybe historically, `iconv` just isn't available everywhere. To provide a package that should work just about everywhere a lot of CMS's and framework are reimplementing existing code. The more clever ones do a feature detection before resorting to it though.
Wrikken