views:

56

answers:

3
+2  Q: 

Regex Lookaheads

Need to capture content of root <pubDate> element, but in document it can be either within <item> element or within <channel> element. Also <item> is child of <channel> I'll bring example

<channel>
  ...
  <pubDate>10/2/2010</pubDate>
  ...
  <item>
    ...
    <pubDate>13/2/2029</pubDate>
    ...
  </item>
  ...
</channel>

need to capture 10/2/2010

With the <item> no problem, can capture it, along with its <pubDate>.

+2  A: 

Regexp is not a good tool to deal with programming language that are parsed with context-free grammars. Try to use XML DOM to do the job.

SHiNKiROU
any hint or example on how to do with XML DOM? I don't want to be bound to Microsoft.XMLDOM indeed.
Michael
+1  A: 

I don't know JavaScript, so I can't help you with the DOM. I agree 100% that it's a bad idea to try and parse XML with regex. There might be a quick, very dirty, and very brittle workaround, though:

If indentation is consistent throughout the file, and <channel> elements are always at the same level of indentation, you could use that fact as a guide for the regex. In your example /^ {2}<pubDate>([^<]*)<\/pubdate>/m (= two spaces after start-of-line) might just work.

Use this at your own risk. Here be dragons etc.

Tim Pietzcker
+1  A: 

Check out jQuery and see if this helps reading/parsing the XML: http://think2loud.com/reading-xml-with-jquery/

KM

KM