views:

346

answers:

1

Hi Everyone,

I have a DB with some text fields pasted from MS Word, and I'm having trouble to strip just the , and tags, but obviously keeping their innerText.

I've tried using the HAP but I'm not going in the right direction..

Public Function StripHtml(ByVal html As String, ByVal allowHarmlessTags As Boolean) As String
    Dim htmlDoc As New HtmlDocument()
    htmlDoc.LoadHtml(html)
    Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span")
    For Each node In invalidNodes
        node.ParentNode.RemoveChild(node, False)
    Next
    Return htmlDoc.DocumentNode.WriteTo()
End Function

This code simply selects the desired elements and removes them... but not keeping their inner text..

Thanks in advance

A: 

Well... I think I found a solution:

Public Function StripHtml(ByVal html As String) As String
    Dim htmlDoc As New HtmlDocument()
    htmlDoc.LoadHtml(html)
    Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span|//p")
    For Each node In invalidNodes
        node.ParentNode.RemoveChild(node, True)
    Next
    Return htmlDoc.DocumentNode.WriteContentTo
End Function

I was almost there... :P

gjsduarte