tags:

views:

260

answers:

9

Hi everyone,

Is there a regex for checking if the xml is well formed ?

Thanks

Edit: If not regex, then is there a good parsing method that i can use in c# that doesnt throw exception. I tried using xmlReader but it didnt work for me.

+6  A: 

This is well beyond the capabilities of regular expressions. In other words, the answer is that it's not possible.

EDIT: There are plenty of tools available to check well-formedness, but they all involve some sort of XML parser/validator. If you provide more information about your environment maybe we can point you in the right direction.

Jim Garrison
You're answering the question as stated, but maybe not providing the information sought. A better answer might be, "use an XmlDocument object or similar to verify well-formedness." Sometimes people don't know the right tool for the job, though they know the job they want to do.
Cheeso
@Cheeso - You are right, I could have been a little more helpful. I've edited the post. Thx.
Jim Garrison
+2  A: 

There is no regex solution, because Jeff told me so.

Stefan Kendall
+1  A: 

No, there is not. (Practically speaking and for the general case, at least.) Use a validating parser if you want to determine whether or not XML is well-formed.

Corey Porter
+5  A: 

No.

XML syntax is irregular enough to give any regular expression nightmares.

You're not the first to ask this, but don't feel bad because the question about parsing HTML and XML with regular expressions will keep being asked because regular expressions look perfect for the job but they aren't sadly.

XML syntax is complex enough that you can't safely parse it with a regex. It looks simple and regular but there's plenty of scope for causing problems. One nasty CDATA section and things get very hard. And consider the RSS feeds where you get HTML embedded in the XML.

So please use an XML parsing library for this. There are plenty of them.

If you want more detail have a look at this question which gives some examples of the horror syntax you can meet and this question which shows what happens if do try to parse these things with Regular Expressions.

Dave Webb
Dave, XML doesn't look regular in the sense of regular expressions. Please look up regular grammars/languages in Wikipedia.
Svante
+1  A: 

Use a XML validator instead.

JP
+1  A: 

No, if recursive regexps are not considered. Regexps can't check arbitratry nesting. However, some regexp engines accept recursive regexps which you may try using for this purpose.

Dmitry
A: 

recent versions of PCRE have all kinds of features which would make this achievable, but the code would be ugly as hell. libxml2 comes with xmllint, why not use the right tool for the job?

just somebody
A: 

I'm making an assumption here. You think that using a library will be too slow or too heavyweight to do this quickly and/or efficiently.

If this is the case then test it out. Try a few libraries, see how big they are, see how fast they are.

Fortyrunner
+2  A: 

If not regex, then is there a good parsing method that i can use in c# that doesnt throw exception. I tried using xmlReader but it didnt work for me.

Using XmlReader and while(reader.Read()) {} (catching any exception) is probably the fastest pure managed approach.

Marc Gravell