tags:

views:

390

answers:

3

How can i get the content from HTML, removing the elements around it.

I am looking for an example using VB6

+2  A: 

You can use Regular Expression; build your pattern and extract the data that you want from HTML. In this link you might find out how you can use Regular Expression in vb6 http://www.regular-expressions.info/vb.html

Pooria
A: 

The HTML may be mal-formed, making it very difficult to remove the tags with regular expressions. An alternative is to load Internet Explorer as a COM object in VB, and then load the HTML doc in Internet Explorer and use it to walk through the interpreted element tree.

Peter
+2  A: 

You can use Internet Explorer as a COM object (without showing it on screen). For example to get a plain-text version of the HTML:

Public Function Html2Text(ByVal Data _
   As String) As String
      Dim obj As Object
      On Error Resume Next
      Set obj = _
         CreateObject("htmlfile")
      obj.Open
      obj.Write Data
      Html2Text = obj.Body.InnerText
End Function

You could also walk the element tree to do something more complicated.

Credit: Karl Peterson in Visual Studio Magazine.

MarkJ