views:

192

answers:

4

Hi, I have a control that will return some html to me as a string. Before putting that on screen, I'd like to be able to tell if it'll just show as empty.

For example the control might return <p><br /></p>, which when I test using C# for string.Emtpy obviously it's not - but nothing gets displayed on screen.

Is there a regex function to test whether html will actually show any text on screen? Or using C# - is there any function to test the string containing html to see whether it actually contains anything other than tags?

Cheers, I'm a little confused how to get around this without writing some custom parser, a road I don't want to have to go down!

+1  A: 

Don't write a custom parser, just use an existing parser and apply some search rules to it.

Ignacio Vazquez-Abrams
+3  A: 

As answered by @Ignacio you should use something like the HTML Agility pack. Here's a sample bit of code that seems to work for your situation.

HtmlDocument docEmpty = new HtmlDocument();
docEmpty.LoadHtml("<p><br /></p>");

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<p>I am not empty...<br /></p>");

bool shouldBeEmpty = string.IsNullOrEmpty(docEmpty.DocumentNode.InnerText);
bool shouldNotByEmpty = string.IsNullOrEmpty(doc.DocumentNode.InnerText);

Note: This sample uses the http://htmlagilitypack.codeplex.com/ parser.

Kane
I haven't used this parser - how would it cope with something like `<b> </b>` in this context?
ZombieSheep
Since I'm using the InnerText property in the code example it would return   in the case you've listed.
Kane
This works a treat Kane, thanks for the code sample
Neal Hudson
Hey Kane, I've just been debugging an issue with this - if there is whitespace between <p> tags the InnerText will return the whitespace - so the test for IsNullOrEmpty returns false. A .Trim() is needed on the InnerText property.
Neal Hudson
+1  A: 

Not sure if it's relevant but I made this test, and it seems to be what the OP wants, without using any external library (but requiring .Net > 3.0)

XElement docEmpty = XElement.Parse("<p><br /></p>");
Console.WriteLine(string.IsNullOrEmpty(docEmpty.Value)); // Outputs True.

XElement doc = XElement.Parse("<p>This is a test<br /></p>");
Console.WriteLine(string.IsNullOrEmpty(doc.Value)); // Outputs False.
Shimrod
Hi Shimrod, this doesn't work so well when the HTML isn't well formed, or isn't in fact HTML. For example, sometimes the control will return the string surrounded by p tags, other times it'll just be the string itself. The HTML Agility Pack seems to handle this better.
Neal Hudson
Oh ok, I wasn't aware of that... Thanks for mentionning !
Shimrod
A: 

As suggested by others, you can use a HTML parser, which is a solid way to handle your need. But I think it would add much overhead, since the parser has to do a lot of stuff to understand the HTML code.

Maybe your idea to use regex is not so bad. It should be quicker too. I suggest you use Regex to replace every opening and closing tag with empty string. Everything that is not replaced should be some text to appear in the internet browser ...

string input = "<p> <br />  </p>";
  string pattern = "<[^<>^]+?>";
  string replacement = "";
  string result1 = Regex.Replace(input, pattern,replacement);
  pattern = "[\s\t\n]*"; ///filter for space, new line, tab 
  string result_final = Regex.Replace(result1 , pattern, replacement);
  if (string.IsNullOrEmpty(result_final)) ... /// empty html
oldbrazil
Your suggestion requires the property that eliminating a tag does not eliminate something displayed. Suppose the text is "<HR>". If you eliminate that then you go from HTML which displays something to HTML which displays nothing.
Eric Lippert
true, but Neal's question was :"Is there a regex function to test whether html will actually show any text on screen?"<HR> does not display "text".I think my solution is still good when you don't want to use a too complex (external) parser as an HTML parser.Then remark that, Parser or Not, how do you handle the case of displaying (transparent) images ?
oldbrazil