Im a little stuck with my app again. My app retrieves "Place names" and their "Addresses" from yellowpages.ca
Here is the code so far:
Dim content As String = ""
Dim web As New HtmlAgilityPack.HtmlWeb
Dim doc As New HtmlAgilityPack.HtmlDocument()
doc.Load(WebBrowser1.DocumentStream)
Dim hnc As HtmlAgilityPack.HtmlNodeCollection = doc.DocumentNode.SelectNodes("//span[@class='listingTitle']") '//////Gets PlaceName/////////
For Each link As HtmlAgilityPack.HtmlNode In hnc
Dim replaceUnwanted As String = ""
replaceUnwanted = link.InnerText.Replace("&", "&") '
replaceUnwanted = replaceUnwanted.Replace("'", "'")
replaceUnwanted = replaceUnwanted.Replace("See full business details", "")
replaceUnwanted = replaceUnwanted.ToLower().Replace(vbCrLf, "")
content &= replaceUnwanted & vbNewLine
Next
RichTextBox1.Text = content
Me.RichTextBox1.Lines = Me.RichTextBox1.Text.Split(New Char() {ControlChars.Lf}, _
StringSplitOptions.RemoveEmptyEntries)
Dim content2 As String = ""
Dim doc2 As New HtmlAgilityPack.HtmlDocument()
doc2.Load(WebBrowser1.DocumentStream)
Dim hnc2 As HtmlAgilityPack.HtmlNodeCollection = doc2.DocumentNode.SelectNodes("//div[@class='address']/text()[normalize-space(.)]")'//////Gets Address//////
For Each link As HtmlAgilityPack.HtmlNode In hnc2
Dim replaceUnwanted As String = ""
replaceUnwanted = link.InnerText.Replace("&", "&")
replaceUnwanted = replaceUnwanted.Replace("'", "'")
replaceUnwanted = link.InnerText.Replace("Map", "")
replaceUnwanted = replaceUnwanted.Replace("Map", "")
content2 &= replaceUnwanted & vbNewLine
Next
RichTextBox2.Text = content2.Replace(ControlChars.Tab, "")
Me.RichTextBox2.Lines = Me.RichTextBox2.Text.Split(New Char() {ControlChars.Lf}, _
StringSplitOptions.RemoveEmptyEntries)
So my app gets all the PlaceNames and puts them in richtextbox1 and gets all the addresses and puts them in richtextbox2. This would be perfect if yellowpages didnt have flaws,,, but they do. Some of their "PlaceNames" dont have "Addresses". Eg:
JH Ryder Machinery Limited
Convenience Storage Ltd 3344 Rideau Rd, Gloucester, ON, K1G3N4 Map
Regional Physiotherapy Clinic 1443 Woodroffe Ave, Nepean, ON, K2G1W1 Map
So now there are more place names than addresses and they dont match up with each other. How can I make sure they always match up? Or any other workaround, like deleting/skipping "PlaceNames" without addresses.
Here is the url if someone want to take a look at the html: http://www.yellowpages.ca/search/?stype=si&what=sh&where=Ottawa,+ON&x=0&y=0