ansaurus

Question

Regex a xml string

Answer 1

+4 A:

Regular expressions are not the best option when parsing large amounts of HTML or XML.

There are a number of ways you could handle this without relying on Regular Expressions. Depending on the libraries you have at your disposal you may be able to find the elements you're looking for by using XPaths.

Heres a helpful tutorial that may help you on your way: http://www.totheriver.com/learn/xml/xmltutorial.html

Doomspork 2009-08-26 19:50:01

Answer 2

+3 A:

Regular expression is not the right tool for this job. You should be using an XML parser. It's pretty simple to setup and use, and will probably take you less time to code. It then will come up with this regular expression.

I recommend using JDOM. It has an easy syntax. An example can be found here: http://notetodogself.blogspot.com/2008/04/teamsite-dcr-java-parser.html

If the documents that you will be parsing are large, you should use a SAX parser, I recommend Xerces.

mkoryak 2009-08-26 19:51:38

Answer 3

+1 A:

When dealing with XML, you should probably not use regular expressions to check the content. Instead, use either a SAX parsing based routine to check relevant contents or a DOM-like model (preferably pull-based if you're dealing with large documents).

Of course, if you're trying to validate the document's contents somehow, you should probably use some schema tool (I'd go with RELAX NG or Schematron, but I guess you could use XML Schema).

djc 2009-08-26 19:52:01

Answer 4

+2 A:

Look up XPath, which is kinda like regex for XML. Sort of.

With XPath you write expressions that extract information from XML documents, so extracting the nodes which don't have Loop as a sub-node is exactly the sort of thing it's cut out for.

I haven't tried this, but as a first stab, I'd guess the XPath expression would look something like:

"//ser:serviceItemValues/ord1:value[text()!='Loop']/parent::*"

izb 2009-08-26 20:29:12

Stop upvoting this, you all know this is the wrong way to approach the problem :(

Esko 2009-11-25 14:05:41

Why is this wrong? This is exactly what xpath is for, isn't it?

izb 2009-11-26 09:50:12

Answer 5

+1 A:

As mentioned by the other answers, regular expressions are not the tool for the job. You need a XPath engine. If you want to these things from the command line though, I recommend to install XMLStar. I have very good experience with this tool and solving various XML related tasks. Depending on your OS you might be able to just install the xmlstarlet RPM or deb package. Mac OS X ports includes the package as well I think.

Hardy 2009-08-27 06:37:27

Ups, you wanted to do it in Java. Well, xmlstar is still a cool tool.

Hardy 2009-08-27 06:39:11

Answer 6

A:

See this answer for thorough explanation.

Esko 2009-11-25 14:05:16

ansaurus

tags:

views:

answers:

Regex a xml string

related questions