views:

388

answers:

5

I have a sample set of XML returned back:

<rsp stat="ok">
  <site>
    <id>1234</id>
    <name>testAddress</name>
    <hostname>anotherName</hostname>
    ...

  </site>
  <site>
    <id>56789</id>
    <name>ba</name>
    <hostname>alphatest</hostname>
    ...
  </site>
</rsp>

I want to extract everything within <name></name> but not the tags themselves, and to have that only for the first instance (or based on some other test select which item).

Is this possible with regex?

+2  A: 

Without knowing your language or environment, here are some perl expressions. Hopefully it will give you the right idea for your application.

Your regular expression to capture the text content of a tag would look something like this:

m/>([^<]*)</

This will capture the content in each tag. You will have to loop on the match to extract all content. Note that this does not account for self-terminated tags. You would need a regex engine with negative lookbehinds to accomplish that. Without knowing your environment, it's hard to say if it would be supported.

You could also just strip all tags from your source using something like:

s/<[^>]*>//g

Also depending on your environment, if you can use an XML-parsing library, it will make your life much easier. After all, by taking the regex approach, you lose everything that XML really offers you (structured data, context awareness, etc).

jheddings
Yeah I am trying to use Objective-C. I didnt want to add any extra libraries or files, I thought maybe there would be a simple way for an xml string i get returned
Doron Katz
+1 for the excellent advice on using an XML parser.
TrueWill
+3  A: 

<disclaimer>I don't use Objective-C</disclaimer>

You should be using an XML parser, not regexes. XML is not a regular language, hence not easely parseable by a regular expression. Don't do it.

Never use regular expressions or basic string parsing to process XML. Every language in common usage right now has perfectly good XML support. XML is a deceptively complex standard and it's unlikely your code will be correct in the sense that it will properly parse all well-formed XML input, and even it if does, you're wasting your time because (as just mentioned) every language in common usage has XML support. It is unprofessional to use regular expressions to parse XML.

You could use Expat, with has Objective C bindings.

Apple's options are:

  1. The CF xml parser
  2. The tree based Cocoa parser (10.4 only)
voyager
A: 

As others say, you should really be using NSXMLParser for this sort of thing.

HOWEVER, if you only need to extract the stuff in the name tags, then RegexKitLite can do it quite easily:

NSString * xmlString = ...;
NSArray * captures = [xmlString arrayOfCaptureComponentsMatchedByRegex:@"<name>(.*?)</name>"];
for (NSArray * captureGroup in captures) {
  NSLog(@"Name: %@", [captureGroup objectAtIndex:1];
}
Dave DeLong
A: 

Careful about namespaces:

<prefix:name xmlns:prefix="">testAddress</prefix:name>

is equivalent XML that will break regexp based code. For XML, use an XML parser. XPath is your friend for things like this. The XPath code below will return a sequence of strings with the info you want:

./rsp/site/name/text()

Cocoa has NSXML support for XPath.

Harold L
+1  A: 

The best tool for this kind of task is XPath.

NSURL *rspURL = [NSURL fileURLWithPath:[@"~/rsp.xml" stringByExpandingTildeInPath]];
NSXMLDocument *document = [[[NSXMLDocument alloc] initWithContentsOfURL:rspURL options:NSXMLNodeOptionsNone error:NULL] autorelease];

NSArray *nodes = [document nodesForXPath:@"/rsp/site[1]/name" error:NULL];
NSString *name = [nodes count] > 0 ? [[nodes objectAtIndex:0] stringValue] : nil;

If you want the name of the site which has id 56789, use this XPath: /rsp/site[id='56789']/name instead. I suggest you read W3Schools XPath tutorial for a quick overview of the XPath syntax.

0xced