tags:

views:

40

answers:

2

I get some URL from a XML feed. Now the question is how do I get a specific data from each page represented by those URLs. For example if I have a URL: www.abc.com in the feed data and on that page there is a table like this:

<table>
<body>
<tr>
 <td class="snip">

  <span class="summary">
   abc ... abc &amp; xyz ...
   <br>
   .......
   <br>
  </span>

  <span>......</span>

 </td>
</tr>
</body>
</table>

Now the question is how do I get the content of the span which has the class "summary" and which is the child of the having a class name "snip". We do have to decode\remove the encoded html contained by the span.

Any regex based soln? Any idea how to do it from server side?

A: 

Hey,

Not sure I understand 100% of the issue, but I think maybe you are trying to do a screen scrape, as described here? http://www.4guysfromrolla.com/webtech/070601-1.shtml

Otherwise, client-side HTML, because they aren't server tags, can't be read directly on the server as you well know. But, everything posted back to the server is a part of the posted data (ie. Request.Form), so you can get existing values that way.

Alternatively, could JavaScript code work, and stream the data back to the server via a web service that you want?

HTH.

Brian
I cannot do it on client side as the data comes from a 3rd party site.What I planed is from server side if I send a http request to that site that will return the html. From that html I can extract the targeted element and the content that it carries. I think we can use the regex to extract the data. But not sure how to do that.
Rahat
Thnx for the link.The example given on the 4guysfromrolla does something similar but it only puts the html in a label. In our case we have to scan that returned html code to get the data.
Rahat
If its XHTML compliant, use an XML reader to read the data. Otherwise, do string parsing.
Brian
The response was not a valid XML. so I had to do following this article: http://olussier.net/2010/03/30/easily-parse-html-documents-in-csharp/It worked perfect for me.
Rahat
+1  A: 
Public Function GetElements(ByVal TagName As String, ByVal ClassName As String) As List(Of XElement)
    Dim Document = XDocument.Load("http://urlofyourchoice.net/")
    Dim Elements = Document.Descendants().Where(Function(e) e.Name.LocalName = TagName AndAlso e.Attribute("class") = ClassName)

    Return Elements.ToList
End Function

Sub Usage() Handles Me.Load
    Response.Write(GetElements("div", "ContentBox").First.ToString())
End Sub

Note that this will not work if the returned response is not a valid xml document.

diamandiev
Cany anyone translate the above code into C#?
Rahat
The returned response is a html page.
Rahat