I'm having troubles with a regexp. I'm looking through a set of XML files, and trying to detect some text inside specific nodes that contain a line break.
Here is some sample data:
<item name='GenMsgText'><text>The signature will be discarded.</text></item>
<item name='GenMsgText'><text>The signature will be discarded.<break/>
Do you want to continue?</text></item>
In that sample, I want to catch only the text in the second node. I've come up with the below solution that uses a second regexp, but I'd like to know if I can do the same thing using only one.
if ($content =~m{<item name='GenMsgText'>(<textlist>)?<text>(.*?)</text>}si)
{
$t = $2;
if ($t =~m {\n}i)
{
print G $t."\n\n";
}
}
This is for a one-shot tool that isn't meant to be reused, so I'd like to avoid having to write any parsing code that's more than a few lines. Besides, the code above already works, I asked the question for personal knowledge more than for real use.