views:

107

answers:

2

Our iphone app has a chatroom where users can post comments. Recently, the chatroom has been crashing the app because users are adding emojis to their comments. I went to my server PHP script to not allow characters that aren't in the A-z0-9 range (I also allow around 30 punctuations characters) hoping that this would prevent the app/feed from crashing. However, emojis are still crashing the chatroom.

This is my regular expression filter in my server script that disallows comments with special characters:

$special = "/\W/";
$special2 = "/[\~\!\@\#\$\%\^\&\*\(\)\_\+\`\-\=\{\}\|\:\\\"\<\>\?\,\.\/\;\'\[\]]/";

if ((preg_match($special,$comment)) && (!preg_match($special2,$comment)))

The PHP statement above says is that if the script finds a character that is not [A-z][0-9] and not one of the punctuation marks listed, then to reject the comment.

The comment that broke the app recently is below:

<comment>Exciting times&icirc;€Žits all about the &icirc;&sect; go Team!!</comment>

Any suggestions on what to do to prevent the app from crashing?

A: 

If I had to hazard a guess, here is what I think is going on. Chances are your app does not handle unicode properly. There are any number of things that could be going on (assuming character counts == byte counts, etc), but if you get certain unicode strings sent to your app it crashes.

iPhone Empoji is implemented as unicode (using part of the private code range at U+E001–U+E05A). The reason you are not able to filter it correctly is the PHP regexp engine does not parse incoming strings using high unicode ranges unless you append "u" to the end of the string:

$special2 = "/[\~\!\@\#\$\%\^\&\*\(\)\_\+\`\-\=\{\}\|\:\\\"\<\>\?\,\.\/\;\'\[\]]/u";

Doing that may have other unintended consequences depending on exactly how things are setup, and it would be much better in the long run to make sure you can handle arbitrary unicode strings correctly.

Louis Gerbarg
A: 

Here is how I solved the problem. The program now decodes/encodes the comment before inserting it into the database.

$comment = utf8_decode($comment); 
$comment = utf8_encode($comment);

I also added a utf-8 header to the dynamic xml/php feed:

header('Content-type: text/html; charset=utf-8');

The emojis do not display, which is fine. But the feed is now valid and does not crash the app. Problem solved.

Miriam Raphael Roberts