tags:

views:

372

answers:

3

I'm looking for a regular expression that detects whether or not a string is anything more than a bunch of HTML tags.

So, desired functionality is:

Input -> Output

"<html></html>" -> False

"<html>Hi</html>" -> True

"<a href='google.com'>Click Me</a>" -> True

"hello" -> True

"<bold><italics></bold></italics>" -> False

"" -> Don't care

Once upon a time I could have done this myself, but it's been too long.

Thanks in advance.

edit: I don't care if they are real HTML tags. Lets call anything inside <>'s a tag. Also don't care if a start tag matches up with an end tag.

+2  A: 

Replace "<[^>]*>" with the empty string, trim the result and check if there is anything left afterwards.

Tomalak
Thanks for the quick response, I used this method and it worked.
A: 

I once used this to strip out html tags:

const string tagsPatterns = "\\s*<.*?>\\s*"; 
value = System.Text.RegularExpressions.Regex.Replace(value, tagsPatterns, " ");

I guess you can play with it a bit (this version wanted to keep white spaces), to get the string with no tags, and check if it isn't empty

Update 1: Here it goes :)

bool HasText(string value)
{
    const string tagsPatterns = "<.*?>"; 
    value = System.Text.RegularExpressions.Regex.Replace(value, tagsPatterns, "");
    return value.Trim() != "";
}
[TestMethod]
public void TestMethod2()
{
    Assert.IsFalse(HasText("<html></html>"));
    Assert.IsTrue(HasText("<html>Hi</html>"));
    Assert.IsTrue(HasText("<a href='google.com'>Click Me</a>"));
    Assert.IsTrue(HasText("hello"));
    Assert.IsFalse(HasText("<bold><italics></bold></italics>"));
    Assert.IsFalse(HasText(""));
}
eglasius
A: 

Here's an article written by Phil Haack about using a regular express to match html.

Also, if you want a simple line of code, consider loading the string into an XmlDocument. It would parse it so you'll know if you have valid xml or not.

ajma
I believe you misunderstood the question a bit.
Tomalak