ansaurus

Question

Regex to match tag contents while simultaneously omitting leading and trailing whitespace

Answer 1

+3 A:

You should not use regext to parse html.

Use a parser instead.

Also: http://stackoverflow.com/questions/3817821/regex-to-remove-body-tag-attributes-c

Also also: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

If all that doesn't convince you, then don't use the dot in the middle of your expression. Use the alphanumeric escape. Your dot is consuming whitespace. Use \w (I think) instead.

JoshD 2010-09-29 01:46:24

Thanks for the answer and the comment. I was only looking for some regex pointers on this particular question; however, because of your answer and the links you posted, I am going to look into using .NET's XmlReader to parse our KML files instead of the way we're currently doing it.

Dark Lord Kvl 2010-10-01 02:14:12

Answer 2

A:

Use these regular expressions to strip trailing and leading whitespaces. /^\s+/ and /\s+$/

Ruel 2010-09-29 01:50:13

Answer 3

A:

        test = "<tag>     test    </tag>";
        string pattern3 = @"<tag>(.*?)</tag>";
        Console.WriteLine("{0}", Regex.Match(test,pattern3).Groups[1].Value.Trim());

Les 2010-09-29 02:14:47

Answer 4

+1 A:

Drop the lookarounds; they just make the job more complicated than it needs to be. Instead, use a capturing group to pick out the part you want:

<tag>\s*(.*?)\s*</tag>

The part you want is available as $matches[1].

Alan Moore 2010-09-29 04:43:40

Thanks! This was the type of tip I was looking for, and it works great.

Dark Lord Kvl 2010-10-01 02:11:34

ansaurus

tags:

views:

answers:

Regex to match tag contents while simultaneously omitting leading and trailing whitespace

related questions