views:

1998

answers:

5

I'm developing an application for the iPhone that has inApp-mail sending capabilities. So far so good, but now I want to avoid html-injections as some parts of the mail are user-generated texts.

Basically I search for something like this:

// inits
NSString *sourceString = [NSString stringWithString:@"Hello world! Grüße dich Welt <-- This is in German."];

//                                          -----   THAT'S WHAT I'M LOOKING FOR
// pseudo-code                              |
//                                          V
NSString *htmlEncodedString = [sourceString htmlEncode];

// log
NSLog(@"source string: %@", sourceString);
NSLog(@"encoded string: %@", htmlEncodedString);

Expected output
source string: Hello world! Grüße dich Welt <-- This is in German.
encoded string: Hello world! Gr&#252;&#223;e dich Welt &lt;-- This is in German.

I already googled and looked through several of SO's questions and answers, but all of them seem to be related to URL-encoding and that's not what I really need (I tried stringByAddingPercentEscapesUsingEncoding with no luck - it creates %C3%BC out of an 'ü' that should be an ü).

A code sample would be really great (correcting mine?)...

--
Thanks in advance,
Markus

A: 

Assuming the character encoding of the email supports Unicode - say UTF-8 - could you not just find and replace the occurrences of <, >, and & with &lt, &gt, and &amp;?

teabot
Thanks for your answer. Basically you are right, but as there is a function to encode URLs (stringByAddingPercentEscapesUsingEncoding) I wondered if there is no similar one for HTML character encoding. I'd find it strange if I was the first one with this kind of _problem_ and even stranger if there wasn't a "right" way of doing this that is other than reinventing the wheel.
Markus
No, there's no such built-in function. You could use NSScanner to replace as suggested by teabot; here's an approach that completely strips HTML tags, you could just modify it: http://sugarmaplesoftware.com/25/strip-html-tags/
Pascal
Thanks a lot for the answer and the link!
Markus
+1  A: 

See this dupe: http://stackoverflow.com/questions/659602/objective-c-html-escape-unescape

Roger Nolan
Thanks! (Edit: removed the wrong comment - sorry)
Markus
+1  A: 

Thanks @all. I ended up using my own implementation:

//
// _________________________________________
//
// textToHtml
// _________________________________________
//
- (NSString*)textToHtml:(NSString*)htmlString {
    htmlString = [htmlString stringByReplacingOccurrencesOfString:@"&"  withString:@"&amp;"];
    htmlString = [htmlString stringByReplacingOccurrencesOfString:@"<"  withString:@"&lt;"];
    htmlString = [htmlString stringByReplacingOccurrencesOfString:@">"  withString:@"&gt;"];
    htmlString = [htmlString stringByReplacingOccurrencesOfString:@"""" withString:@"&quot;"];    
    htmlString = [htmlString stringByReplacingOccurrencesOfString:@"'"  withString:@"&#039;"];
    htmlString = [htmlString stringByReplacingOccurrencesOfString:@"\n" withString:@"<br>"];
    return htmlString;
}
Markus
just a side note: since you are also replacing \n with <br> that you should name your function differently (textToHtml for example). The name escapeHTML will indicate to other developers that you are just doing escaping (which you are not) and this will eventually cause bugs if someone tries to re-use this function...
Nir Levy
Good point. Just updated the code snippet accordingly. Thanks!
Markus
Aren't you leaking a bunch of NSStrings there?
Rhythmic Fistman
Wait, no, they're all autoreleased: http://stackoverflow.com/questions/531550/string-manipulation-without-memory-leaks
Rhythmic Fistman
A: 

Markus,

In your implementation you should escape the & first.

Tome
Thanks! Just fixed it :)
Markus
+1  A: 

Check out my NSString category for HTML. Here are the methods available:

- (NSString *)stringByConvertingHTMLToPlainText;
- (NSString *)stringByDecodingHTMLEntities;
- (NSString *)stringByEncodingHTMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;
Michael Waterfall