ansaurus

Question

iPhone SDK - stringWithContentsOfUrl ASCII characters in HTML source

Answer 1

A:

Are you sure they originally are not in Å form? Try to view the source code in a browser first.

KennyTM 2010-02-22 12:17:45

The web page looks fine, but I have to believe there is a better way than this: http://stackoverflow.com/questions/659602/objective-c-html-escape-unescape

2010-02-22 12:23:51

To clarify, the web page source displays -characters, but I want their equivalent in the NSString (as displayed in a web browser).

2010-02-22 12:27:08

@user: If they are originally in `Å` form and you want to convert them into `Å` then no, there's nothing better than that.

KennyTM 2010-02-22 12:28:03

Answer 2

A:

That really, really sucks. I wanted to convert it directly and the above solution isn't really a good one, so I just wrote my own ascii-table converter (static) class. Works as it should have worked natively (though I have to fill in the ascii table myself...)

Ideas for optimization? ("ASCII" is a static NSDictionary)

@implementation InternetHelper

+(NSString *)HTMLSourceFromUrlWithString:(NSString *)str convertASCII:(BOOL)state
{
    NSURL *url = [NSURL URLWithString:str];
    NSString *source = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:nil];

    if (state)
        source = [InternetHelper ConvertASCIICharactersInString:source];

    return source;
}

+(NSString *)ConvertASCIICharactersInString:(NSString *)str
{
    NSString *ret = [NSString stringWithString:str];

    if (!ASCII)
    {
        NSString *path = [[NSBundle mainBundle] pathForResource:kASCIICharacterTableFilename ofType:kFileFormat];
        ASCII = [[NSDictionary alloc] initWithContentsOfFile:path];
    }

    for (id key in ASCII)
    {
        ret = [ret stringByReplacingOccurrencesOfString:key withString:[ASCII objectForKey:key]];
    }

    return ret;
}       

@end

2010-02-22 13:01:34

ASCII does not mean what you seem to think it means. It is an encoding (and a very small one at that); it has nothing to do with SGML or XML entity references. Moreover, there is a simpler, easier way to do this; see my answer.

Peter Hosey 2010-02-23 11:39:18

Answer 3

+1 A:

I'm using
+stringWithContentsOfUrl: encoding: error; 
to fetch the source and have tried several different encodings such as NSUTF8StringEncoding and NSASCIIStringEncoding, but nothing seems to affect the end result string.

You're misunderstanding the purpose of that encoding: argument. The method needs to convert bytes into characters somehow; the encoding tells it what sequences of bytes describe which characters. You need to make sure the encoding matches that of the resource data.

The entity references are an SGML/XML thing. SGML and XML are not encodings; they are markup language syntaxes. stringWithContentsOfURL:encoding:error: and its cousins do not attempt to parse sequences of characters (syntax) in any way, which is what they would have to do to convert one sequence of characters (an entity reference) into a different one (the entity, in practice meaning single character, that is referenced).

You can convert the entity references to un-escaped characters using the CFXMLCreateStringByUnescapingEntities function. It takes a CFString, which an NSString is (toll-free bridging), and returns a CFString, which is an NSString.

Peter Hosey 2010-02-23 11:37:20

Thanks, I'll check that out.

2010-03-02 15:49:55

ansaurus

tags:

views:

answers:

iPhone SDK - stringWithContentsOfUrl ASCII characters in HTML source

related questions