tags:

views:

1211

answers:

8

For example:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>title</title>
</head>
<body>
    <a href="aaa.asp?id=1"> I want to get this text </a>
    <div>
        <h1>this is my want!!</h1>
        <b>this is my want!!!</b>
    </div>
</body>
</html>

and the result is:

 I want to get this text 
this is my want!!
this is my want!!!
A: 

Change to

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>title</title>
<head>
<body>
    <a href="aaa.asp?id=1" runat="server" id="mya"> I want to get this text <a>
    <div>
        <h1 runat="server" id="myh1">this is my want!!<h1>
        <b  runat="server" id="myb">this is my want!!!<b>
    <div>
</body>
</html>

At code-behind, use mya.InnerHtml, myh1.InnerHtml, myb.InnerHtml

Fujiy
Thanks,but this is not my want,i want like this methodpublic string CleanHtml(string inputHtml){ string result=""; .... return result;}
guaike
A: 

I would recommend using something like HTMLTidy.

Here's a tutorial on it to get you started.

Ólafur Waage
A: 

Why do you want to make it server side?

For that you have to make the container element runat="server" and then take the innerText of the element.

You can do the same in javascript without making the element runat="server"

rahul
I am developing a news system, I would like to interception as a summary of some news content displayed in the Home
guaike
+10  A: 

HTML Agility Pack:

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);
    string s = doc.DocumentNode.SelectSingleNode("//body").InnerText;
Marc Gravell
A: 

how about using Regexp?

guaike
A: 

If you just want to remove the html tags then use a regular expression that deletes anything between "<" and ">".

Andrew Marsh
I am a bit worried about the regex is so slow
guaike
+1  A: 

Use this function...

public string Strip(string text)
{
    return Regex.Replace(text, @”<(.|\n)*?>”, string.Empty);
}
diegodsp
A: 

Use this function:

public static string RemoveTags(string html)
    {
        string returnStr = "";
        bool insideTag = false;
        for (int i = 0; i < html.Length; ++i)
        {
            char c = html[i];
            if (c == '<')    
                insideTag = true;
            if (!insideTag)
                returnStr += c;
            if (c == '>')         
                insideTag = false;
        }
        return returnStr;        
    }
James Lawruk