views:

10027

answers:

10

Wondering if there is an easy way to do a simple HTML escape/unescape in Objective C. What I want is something like this psuedo code:

NSString *string = @"<span>Foo</span>";
[string stringByUnescapingHTML];

Which returns

<span>Foo</span>

Hopefully unescaping all other HTML entities as well and even ASCII codes like Ӓ and the like.

Is there any methods in Cocoa Touch/UIKit to do this?

+9  A: 

This link contains the solution below. Cocoa CF has the CFXMLCreateStringByUnescapingEntities function but that's not available on the iPhone.

@interface MREntitiesConverter : NSObject {
    NSMutableString* resultString;
}
@property (nonatomic, retain) NSMutableString* resultString;
- (NSString*)convertEntiesInString:(NSString*)s;
@end


@implementation MREntitiesConverter
@synthesize resultString;
- (id)init
{
    if([super init]) {
        resultString = [[NSMutableString alloc] init];
    }
    return self;
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
        [self.resultString appendString:s];
}
- (NSString*)convertEntiesInString:(NSString*)s {
    if(s == nil) {
        NSLog(@"ERROR : Parameter string is nil");
    }
    NSString* xmlStr = [NSString stringWithFormat:@"<d>%@</d>", s];
    NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
    NSXMLParser* xmlParse = [[[NSXMLParser alloc] initWithData:data] autorelease];
    [xmlParse setDelegate:self];
    [xmlParse parse];
    return [NSString stringWithFormat:@"%@",resultString];
}
- (void)dealloc {
    [resultString release];
    [super dealloc];
}
@end
Andrew Grant
Wouldn't it be easier to implement this as an NSString category rather than an entirely separate object? Also, the return string is not autoreleased but the caller shouldn't own it because it was not explicitly allocated by the caller.
dreamlax
xmlParse also leaks btw, just add an autorelease to it and returnStr
Jarin Udom
If you make it an NSString category, you still need a delegate for the parser. So you will need a separate object anyway.
William Jockusch
+7  A: 

What about the NSString methods:

- (NSString *)stringByAddingPercentEscapesUsingEncoding:(NSStringEncoding)encoding
- (NSString *)stringByReplacingPercentEscapesUsingEncoding:(NSStringEncoding)encoding

Those do exist on the iPhone. I know you were asking for more the HTML variant though, which this is not...

Kendall Helmstetter Gelner
As-is in iOS, those only work for URL escaped entities, not HTML entities.
Matthew Frederick
A: 

MREntitiesConverter doesn't work for escaping malformed xml. It will fail on a simple URL:

http://www.google.com/search?client=safari&amp;rls=en&amp;q=fail&amp;ie=UTF-8&amp;oe=UTF-8

richcollins
A: 

The MREntitiesConverter above is an HTML stripper, not encoder.

If you need an encoder, go here: http://stackoverflow.com/questions/803676/encode-nsstring-for-xml-html

Brain2000
A: 

When I try the MREntitiesConverter, the XML parser gives an error to the delegate parser:parseErrorOccurred:

Error Domain=NSXMLParserErrorDomain Code=64 "Operation could not be completed. (NSXMLParserErrorDomain error 64.)

Looking in NSXML.h I find

error 64 = NSXMLParserMisplacedXMLDeclarationError

Does wrapping the XML data in another tag potentially produce incorrect XML? For example, if the XML data has an XML declaration/header at the beginning e.g. "". Putting this header inside a tag seems like incorrect XML.

I guess I need to strip this header? Any other ideas?

DJ Fitz
A: 

I should add I was trying to unHTML encode an entire XML document that had HTML encoding inside. Why? Well, this is what I get back from Yahoo's Soap API.

DJ Fitz
+6  A: 

Check out my NSString category for HTML. Here are the methods available:

- (NSString *)stringByConvertingHTMLToPlainText;
- (NSString *)stringByDecodingHTMLEntities;
- (NSString *)stringByEncodingHTMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;
Michael Waterfall
A: 

This is an easy to use NSString category implementation:

It is far from complete but you can add some missing entities from here: http://code.google.com/p/statz/source/browse/trunk/NSString%2BHTML.m

Usage:

#import "NSString+HTML.h"

NSString *raw = [NSString stringWithFormat:@"<div></div>"];
NSString *escaped = [raw htmlEscapedString];
Blago
A: 

The links Blago posted are helpful, I was able to add ( @"'", @"'" ) But line 51 of the NSSting+HTML.m file needs to return htmlUnescapes; (not htmlEscapes)

LeeIII
A: 

This is an incredibly hacked together solution I did, but if you want to simply escape a string without worrying about parsing, do this:

-(NSString *)htmlEntityDecode:(NSString *)string
    {
        string = [string stringByReplacingOccurrencesOfString:@"&quot;" withString:@"\""];
        string = [string stringByReplacingOccurrencesOfString:@"&apos;" withString:@"'"];
        string = [string stringByReplacingOccurrencesOfString:@"&amp;" withString:@"&"];
        string = [string stringByReplacingOccurrencesOfString:@"&lt;" withString:@"<"];
        string = [string stringByReplacingOccurrencesOfString:@"&gt;" withString:@">"];

        return string;
    }

I know it's by no means elegant, but it gets the job done. You can then decode an element by calling:

string = [self htmlEntityDecode:string];

Like I said, it's hacky but it works. IF you want to encode a string, just reverse the stringByReplacingOccurencesOfString parameters.

Andrew Kozlik
And how about perfomance?? You are going through the string 5 times. It doesn't seem very efficient ;)
HyLian
It's definitely not the most efficient solution, but it works. What would be a more efficient way to do this?
Andrew Kozlik