views:

57

answers:

1

I am trying to get all css link from html like this segment of code:

<link href="http://media.ticketmaster.com/en-us/css/1c84b57773d8f594407f0b0b78d67aba/tm/default.css" rel="stylesheet" type="text/css" />
<link type="text/css" rel="stylesheet" href="http://media.ticketmaster.com/en-us/css/1c84b57773d8f594407f0b0b78d67aba/tm/datepicker.css"/&gt;
<link href="http://media.ticketmaster.com/en-us/css/1c84b57773d8f594407f0b0b78d67aba/tm/carousel.css" rel="stylesheet" type="text/css" />
<link href="http://media.ticketmaster.com/en-us/css/1c84b57773d8f594407f0b0b78d67aba/tm/langoverlay_en-us.css" rel="stylesheet" type="text/css" />

Here is my code:

-(void)matchCSS:(NSString *)html{
    NSString *regexString = @"href=\".*\.css\"";
    NSArray *matchArray = NULL;
    matchArray = [html componentsMatchedByRegex:regexString];
    NSLog(@"matchArray: %@", matchArray);
}

However, what I got is a little bit crazy:

"href=\"http://media.ticketmaster.com/en-us/css/1c84b57773d8f594407f0b0b78d67aba/tm/default.css\" rel=\"stylesheet\" type=\"text/css\"",
"href=\"http://media.ticketmaster.com/en-us/css/1c84b57773d8f594407f0b0b78d67aba/tm/datepicker.css\"",
"href=\"http://media.ticketmaster.com/en-us/css/1c84b57773d8f594407f0b0b78d67aba/tm/carousel.css\" rel=\"stylesheet\" type=\"text/css\"",
"href=\"http://media.ticketmaster.com/en-us/css/1c84b57773d8f594407f0b0b78d67aba/tm/langoverlay_en-us.css\" rel=\"stylesheet\" type=\"text/css\""

These are not pure link, some of them contains some other tags that I don't want. I didn't see anything wrong with my RE. Any suggestion?

+1  A: 

The problem is with the .*, which is too greedy. You should match every character that is not the quote character. I am not familiar with the regular expression syntax used by RegexKitLite, but I think the regular expression should be something like @"href=\"[^\"]*\\.css\"".

You should probably use a group; in that way, the function would return you only the characters included in the group, and not all the characters matching the regular expression. If I am not wrong, the regular expression should be something like @"href=\"([^\"]*\\.css)\"", in this case.

kiamlaluno
In the case you want to get only the characters matching the group, you should use the method `arrayOfCaptureComponentsMatchedByRegex:`, rather than `componentsMatchedByRegex:`.
kiamlaluno