views:

162

answers:

3

Hi!

I've been struggling all day with trying to find classes that converts/decodes ASCII characters to readable text.

I've found this method here at Stack Overflow, and it fixes many of the characters to readable text. But I'm still struggling with for example:

&44;

&46;

&58;

&39;

...and so forth.

I'm receiving my data from a XML-file with TBXML and the encoding on the XML is:

iso-8859-1

Does anybody has a method that converts/decodes all the ASCII-characters to readable text?

- (NSString *)stringByDecodingXMLEntities {
    NSUInteger myLength = [self length];
    NSUInteger ampIndex = [self rangeOfString:@"&" options:NSLiteralSearch].location;

    // Short-circuit if there are no ampersands.
    if (ampIndex == NSNotFound) {
        return self;
    }
    // Make result string with some extra capacity.
    NSMutableString *result = [NSMutableString stringWithCapacity:(myLength * 1.25)];

    // First iteration doesn't need to scan to & since we did that already, but for code simplicity's sake we'll do it again with the scanner.
    NSScanner *scanner = [NSScanner scannerWithString:self];

    [scanner setCharactersToBeSkipped:nil];

    NSCharacterSet *boundaryCharacterSet = [NSCharacterSet characterSetWithCharactersInString:@" \t\n\r;"];

    do {
        // Scan up to the next entity or the end of the string.
        NSString *nonEntityString;
        if ([scanner scanUpToString:@"&" intoString:&nonEntityString]) {
            [result appendString:nonEntityString];
        }
        if ([scanner isAtEnd]) {
            goto finish;
        }
        // Scan either a HTML or numeric character entity reference.
        if ([scanner scanString:@"&" intoString:NULL])
            [result appendString:@"&"];
        else if ([scanner scanString:@"'" intoString:NULL])
            [result appendString:@"'"];
        else if ([scanner scanString:@""" intoString:NULL])
            [result appendString:@"\""];
        else if ([scanner scanString:@"<" intoString:NULL])
            [result appendString:@"<"];
        else if ([scanner scanString:@"&gt;" intoString:NULL])
            [result appendString:@">"];
        else if ([scanner scanString:@"&#" intoString:NULL]) {
            BOOL gotNumber;
            unsigned charCode;
            NSString *xForHex = @"";

            // Is it hex or decimal?
            if ([scanner scanString:@"x" intoString:&xForHex]) {
                gotNumber = [scanner scanHexInt:&charCode];
            }
            else {
                gotNumber = [scanner scanInt:(int*)&charCode];
            }

            if (gotNumber) {
                [result appendFormat:@"%C", charCode];

                [scanner scanString:@";" intoString:NULL];
            }
            else {
                NSString *unknownEntity = @"";

                [scanner scanUpToCharactersFromSet:boundaryCharacterSet intoString:&unknownEntity];


                [result appendFormat:@"&#%@%@", xForHex, unknownEntity];

                //[scanner scanUpToString:@";" intoString:&unknownEntity];
                //[result appendFormat:@"&#%@%@;", xForHex, unknownEntity];
                NSLog(@"Expected numeric character entity but got &#%@%@;", xForHex, unknownEntity);

            }

        }
        else {
            NSString *amp;

            [scanner scanString:@"&" intoString:&amp];      //an isolated & symbol
            [result appendString:amp];


             NSString *unknownEntity = @"";
             [scanner scanUpToString:@";" intoString:&unknownEntity];
             NSString *semicolon = @"";
             [scanner scanString:@";" intoString:&semicolon];
             [result appendFormat:@"%@%@", unknownEntity, semicolon];
             NSLog(@"Unsupported XML character entity %@%@", unknownEntity, semicolon);

        }

    }
    while (![scanner isAtEnd]);

finish:
    return result;
}
+2  A: 

Normally you would let the NSXMLparser handle that job for you. You shouldn't need to do the conversion by hand.

If you do a google on NSXMLParser you will get lots of examples.

Anders K.
A: 

Hi, Anders!

I've tried:

NSData* xmlData = [contents dataUsingEncoding:NSISOLatin1StringEncoding];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:xmlData];
[parser setDelegate:self];
[parser parse];

Shouldn't I specify which encoding the xml-document is somewhere? Shouldn't I be able to "convert" the parser-object into NSData again so I can pass it in to my XML-reader object?

Fernando Redondo
Noticed that NSXMLParser does not have a method that returns the parsed data as NSString or as NSData.Have I misunderstood what NSXMLParser does? All I want is to get rid of all the ASCII characters/entities so I can read the text without seeing these strange "#" in the middle of a sentence.
Fernando Redondo
The solution was this:I didn't see that "
Fernando Redondo
A: 

I have now tried to user NSXMLParser to extract my element values.

But it's the same!

I still end up with #&44; etc etc!

So my switch from TBXML to NSXMLParser didn't go very well.

Have I missed something?

Fernando Redondo