views:

165

answers:

4

Hi,

I am using regular expression to parse XML file (though regexp is not recommended for xml parsing, but i have to use regexp, no other go).

My doubt is how to skip commented lines in XML file, while parsing using Perl.

I want Perl to parse XML file, while skipping commented lines.

Can anyone help me, please.

Thanks Senthil .

+2  A: 

One way to do it is to strip commented lines prior to parsing.

$string =~ s/<!--.*?-->//gs;
NullUserException
Can you please tell me how to strip off commented lines. I am a new bie. please
Senthil kumar
@NullUserException: Only if it's on one line. Or the file needs to be read in slurp mode.
MvanGeest
@Senthil See edited post
NullUserException
@Mvan Using `/s` modifier
NullUserException
@NullUserException: that won't help if $string has been read like `while ($string = <FILE>)`, which cuts off a line-sized chunk for reading.
MvanGeest
@NullUserException: You'll need the /g flag too.
jmz
@jmz thanks, fixed
NullUserException
+1  A: 

Please, do not parse XML with regular expressions, use XML parser instead.

At least you can write a simple finite-state machine based parser to process your XML. It's very simple to do it.

floatless
The OP is aware of it, but doesn't have another option
NullUserException
to quote op "I am using regular expression to parse XML file (though regexp is not recommended for xml parsing, but i have to use regexp, no other go)." This answer is not useful as he already knows it... bigger question is why he can't use a parser.
xenoterracide
I understand this, but I don't understand why he *have to* use regexp.
floatless
+1 for adding the second paragraph... but i'm with all the others who say "explain why not A CPan parser".
DVK
Unless it's a homework problem, the OP *does* have other options
derby
+2  A: 

If your problem is compiling XML libraries, you can try XML::Parser::Lite or XML::Parser::PurePerl which are pure perl modules requiring no compilation.

Or, you might be able to find pre-compiled packages of the non-pure-perl libraries. What OS are you on?

runrig
MKDoc::XML is another lightweight pure-perl XML parser which, amusingly enough, uses a monster regex as a tokenizer -- but it's the *right* regex.
hobbs
+1  A: 

As bad as this question is for many people, many answers to it are just as bad: use an XML parser, here's why, end of the discussion.

For me, the whole point of asking a question on stackoverflow is to obtain a solution. Have we provided a solution to OP? Not quite.

A more complete answer would offer some examples on how to parse xml. Here are some;

Can you provide an example of parsing HTML with your favorite parser?

Philippe A.