views:

429

answers:

2

I have the following text

<select name="username"><option value="177"> Bob1
                </option><option value="221"> Bob2
                </option><option value="227"> Bob3
                </option><option value="164"> Bob4
                </option><option value="271"> Bob5
                </option><option value="137"> Bob6
                </option><option value="105"> Bob7
                </option><option value="285"> Bob8
                </option><option value="281"> Bob9
                </option><option value="265"> Bob10
                </option></select>

And I am trying to use an NSScanner to get the option value and the name within the option tags. So far I have the following code

for (int y = 1; y < 16; y++) {
    NSScanner *scanner1 = [NSScanner scannerWithString:htmlsource];
    [scanner1 scanUpToString:[NSString stringWithFormat:@"<option value=\""] intoString:NULL];
    [scanner1 scanString:[NSString stringWithFormat:@"<option value=\""] intoString:NULL];
    [scanner1 scanUpToString:@"\"" intoString:&result];
    NSLog(@"%i",[scanner1 scanLocation]);
    NSLog(result);

    [scanner1 setScanLocation:([scanner1 scanLocation] - 18)];
    [scanner1 scanUpToString:[NSString stringWithFormat:@"<option value=\"%@\">",result] intoString:NULL];
    [scanner1 scanString:[NSString stringWithFormat:@"<option value=\"%@\">",result] intoString:NULL];
    [scanner1 scanUpToString:@"</option>" intoString:&result];
    //NSLog([NSString stringWithFormat:@"<option value=\"%@\">",result]);
    NSLog(@"%i",[scanner1 scanLocation]);
    NSLog(result);

    }

This works for the first entry only. Am I going about this wrong or do I have to start the scan where it left off, if so how? Results so far are..

2009-07-31 08:15:53.859 App1[1000:20b] 683
2009-07-31 08:15:53.860 App1[1000:20b] 177
2009-07-31 08:15:53.860 App1[1000:20b] 712
2009-07-31 08:15:53.860 App1[1000:20b] Bob1

2009-07-31 08:15:53.861 App1[1000:20b] 683
2009-07-31 08:15:53.861 App1[1000:20b] 177
2009-07-31 08:15:53.862 App1[1000:20b] 712
2009-07-31 08:15:53.862 App1[1000:20b] Bob1
A: 

If it's well-formed XML then you're probably better off using an XML parser like NSXML to do the heavy lifting for you:

NSXML

The other issue is that you're resetting the scanner back to the start of the option value string, so when you re-scan, you're starting from the same position that you left off each time. Surely the point is in fact not to do this, and to keep going?

[scanner1 setScanLocation:([scanner1 scanLocation] - 18)];

If you comment that line, does it magically start working?

AlBlue
+1  A: 

There's always RegexKitLite.

This version keeps the white space within the <option>...</option>:

NSString *regex = @"(?si)<option\\s+value\\s*=\\s*\"([^\"]*)\"[^>]*>(.*?)</option>";
NSArray *htmlOptionsArray = [htmlsource arrayOfCaptureComponentsMatchedByRegex:regex];
for(NSArray *parsedOptionArray in htmlOptionsArray) {
  NSString *value = [parsedOptionArray objectAtIndex:1UL];
  NSString *text  = [parsedOptionArray objectAtIndex:2UL];
  NSLog(@"Value: '%@', text: '%@'", value, text);
}

Example output:

2009-07-31 04:20:23.692 so[35423:807] Value: '177', text: ' Bob1
                '
2009-07-31 04:20:23.699 so[35423:807] Value: '221', text: ' Bob2
                '
....
2009-07-31 04:20:23.725 so[35423:807] Value: '281', text: ' Bob9
                '
2009-07-31 04:20:23.726 so[35423:807] Value: '265', text: ' Bob10
                '

This version strips off any extra white-space around the option text:

NSString *regex = @"(?si)<option\\s+value\\s*=\\s*\"([^\"]*)\"[^>]*>\\s*(.*?)\\s*</option>";
NSArray *htmlOptionsArray = [htmlsource arrayOfCaptureComponentsMatchedByRegex:regex];
for(NSArray *parsedOptionArray in htmlOptionsArray) {
  NSString *value = [parsedOptionArray objectAtIndex:1UL];
  NSString *text  = [parsedOptionArray objectAtIndex:2UL];
  NSLog(@"Value: '%@', text: '%@'", value, text);
}

Example output:

2009-07-31 04:21:50.352 so[35436:807] Value: '177', text: 'Bob1'
2009-07-31 04:21:50.354 so[35436:807] Value: '221', text: 'Bob2'
...
2009-07-31 04:21:50.359 so[35436:807] Value: '281', text: 'Bob9'
2009-07-31 04:21:50.359 so[35436:807] Value: '265', text: 'Bob10'
johne