views:

69

answers:

1

I have an html node:

<p>Line1
Line2
Line
ThereAreTwoSpacesAfterThis  ThereAreTwoSpacesBeforeThis
</p>

In any browser, the final use result is

Line1 Line2 Line ThereAreTwoSpacesAfterThis ThereAreTwoSpacesBeforeThis 

which is the result I want.

So how to remove the insignificant whitespace in a XmlNodeType.Text node (C#)?

=========================================================

Hi guys, Thanks for your reply.

Actually I'm working on a small project to extract all the text from a web page (html): something like "Save As page as text file" from Firefox or IE.

I tried to use Html Agility Pack, but the result is not good enough.

I also tried to use a WebBrowser control, but it seems too slow, and it's kind of not so easy to control over it.

Any good ideas?

I understand that guys suggest me to use regex, but there are too many cases to think about.

+1  A: 

Just use a regular expression!

var spacesSquashed = Regex.Replace(input, @"\s+", " ", RegexOptions.Singleline);

If you also want to remove all spaces at the beginning and end, as is customary in HTML, add an extra .Trim() at the end.

Timwi
Tempted to upvote because you included that quote :) For this specific usage I guess regex is okay...
Alex Paven
Please see my updated question. Thanks for your reply.
Peter Lee