views:

22

answers:

0

Im a little stuck with my app again. My app retrieves "Place names" and their "Addresses" from yellowpages.ca

Here is the code so far:

    Dim content As String = ""
    Dim web As New HtmlAgilityPack.HtmlWeb
    Dim doc As New HtmlAgilityPack.HtmlDocument()
    doc.Load(WebBrowser1.DocumentStream)
    Dim hnc As HtmlAgilityPack.HtmlNodeCollection = doc.DocumentNode.SelectNodes("//span[@class='listingTitle']") '//////Gets PlaceName/////////
    For Each link As HtmlAgilityPack.HtmlNode In hnc
        Dim replaceUnwanted As String = ""
        replaceUnwanted = link.InnerText.Replace("&", "&") '
        replaceUnwanted = replaceUnwanted.Replace("'", "'")
        replaceUnwanted = replaceUnwanted.Replace("See full business details", "")
        replaceUnwanted = replaceUnwanted.ToLower().Replace(vbCrLf, "")

        content &= replaceUnwanted & vbNewLine
    Next
    RichTextBox1.Text = content
    Me.RichTextBox1.Lines = Me.RichTextBox1.Text.Split(New Char() {ControlChars.Lf}, _
                                               StringSplitOptions.RemoveEmptyEntries)
    Dim content2 As String = ""
    Dim doc2 As New HtmlAgilityPack.HtmlDocument()
    doc2.Load(WebBrowser1.DocumentStream)
    Dim hnc2 As HtmlAgilityPack.HtmlNodeCollection = doc2.DocumentNode.SelectNodes("//div[@class='address']/text()[normalize-space(.)]")'//////Gets Address//////
        For Each link As HtmlAgilityPack.HtmlNode In hnc2
            Dim replaceUnwanted As String = ""
            replaceUnwanted = link.InnerText.Replace("&", "&")
            replaceUnwanted = replaceUnwanted.Replace("'", "'")
            replaceUnwanted = link.InnerText.Replace("Map", "")
            replaceUnwanted = replaceUnwanted.Replace("Map", "")
            content2 &= replaceUnwanted & vbNewLine

        Next
        RichTextBox2.Text = content2.Replace(ControlChars.Tab, "")
        Me.RichTextBox2.Lines = Me.RichTextBox2.Text.Split(New Char() {ControlChars.Lf}, _
                                                   StringSplitOptions.RemoveEmptyEntries)

So my app gets all the PlaceNames and puts them in richtextbox1 and gets all the addresses and puts them in richtextbox2. This would be perfect if yellowpages didnt have flaws,,, but they do. Some of their "PlaceNames" dont have "Addresses". Eg:

  1. JH Ryder Machinery Limited

  2. Convenience Storage Ltd 3344 Rideau Rd, Gloucester, ON, K1G3N4 Map

  3. Regional Physiotherapy Clinic 1443 Woodroffe Ave, Nepean, ON, K2G1W1 Map

So now there are more place names than addresses and they dont match up with each other. How can I make sure they always match up? Or any other workaround, like deleting/skipping "PlaceNames" without addresses.

Here is the url if someone want to take a look at the html: http://www.yellowpages.ca/search/?stype=si&what=sh&where=Ottawa,+ON&x=0&y=0