views:

203

answers:

3

Hey guys, i want to be able to retrieve dynamic data from a web page (share prices). I started out by retrieving the html code before i realised that as it is live data, the html code will be of little use. Although i am looking to capture specific data, all i wish to do is process a webpage that i specify which will return the text off that website and not the HTML code. Basically a copy and paste of the entire page would be great.. Any ideas would be really appreciated!

A: 

Well, the HTML contains the text of the website, so you "just" need to parse the HTML.


EDIT: If the data is not in the HTML but loaded dynamically, the situation is different. As I can see, you have two options:

  1. Find out how the data is loaded (i.e. read the JavaScript on the page). If it is updated via some web service, you could query the same web service in your program.
  2. Use a web browser to get the data and then get the dynamic HTML tree of the page. Maybe the WPF Webbrowser control can help you with this, but I'm not sure since I've never done this myself.
Heinzi
the data i am looking to process is part of a dynamic table which updates every few seconds. So the html will only contain the table with variables(not the values) inside the HTML code
Craig
OK, I understand. I've updated my answer.
Heinzi
A: 

Is it possible to find this same data provided in a ready-to-consume format rather than scraping HTML for it? It seems like there's probably public web-services for stock quotes.


For example: A quick search for "Stock price webservice" turned up http://www.webservicex.net/stockquote.asmx; an ASMX web-service that is easy to consume in .NET.

In your Visual Studio project you should be add a reference to this service via the "Add Web Reference" command; the dialog you're given varies depending on whether your project is targeting for .NET 2.0 or .NET 3.0/3.5.

I added a reference to the service named StockPriceProxy:

Public Function GetQuote(ByVal symbol As String) As String
    Using quoteService As New StockPriceProxy.StockQuote
        return quoteService.GetQuote(symbol)
    End Using
End Function
STW
what way would i be able to get this data into visual studio (vb.net)? i can manually do this and lift the data i want, but all i want to do is click a button named "update" behinf which code will retrieve the latest data from this website.
Craig
+1  A: 

'Screen Scraping' by parsing HTML is so early 2000s...what I would do is read up on Amazon's Mechnical Turk. You can develop a queued architecture where you submit urls to this Mechnical Turk service. The service would automatically distribute these bits of work to users who would then do the dirty task of copying and pasting out the valuable stock quote information you require. Users around the world would anxiously await delivery of the next URL to their Mechanical Turk inbox...pinning for the opportunity to copy/paste out another share price for your application. Sure, it might take a few minutes to update your prices, but hey, they would be HAND parsed by REAL people around the globe! Just think of the possibilities!

Sean
thanks for the suggestion but this isnt really going to help me in the long run as this may be a task that i am doing every few minutes or so.. also that means i would need to input the data myself or else give someone access to my source code, neither of which are preferable! The reason i want to grab the text is for simplicity, i will come up with further methods to process the data and obtain anything i deem relevant
Craig
I was just being a wise guy, as others have suggested here, post up the link to the site you are trying to scrape. You'll get better answers.
Sean