screen-scraping

OpenGL/D3D: How do I get a screen grab of a game running full screen in Windows?

Suppose I have an OpenGL game running full screen (Left 4 Dead 2). I'd like to programmatically get a screen grab of it and then write it to a video file. I've tried GDI, D3D, and OpenGL methods (eg glReadPixels) and either receive a blank screen or flickering in the capture stream. Any ideas? For what it's worth, a canonical example...

URL structure causing incomplete page to be returned by PHP's file_get_contents()

I've been doing some scraping with PHP and getting some strange results on a particular domain. For example, when I download this page: http://pitchfork.com/reviews/tracks/ It works fine. However if I try to download this page: http://pitchfork.com/reviews/tracks/1/ It returns an incomplete page, even though the content is exactly th...

scrapy web scraper can not crawl link

Hi, I'm very new to Scrapy. Here my spider to crawl twistedweb. class TwistedWebSpider(BaseSpider): name = "twistedweb3" allowed_domains = ["twistedmatrix.com"] start_urls = [ "http://twistedmatrix.com/documents/current/web/howto/", ] rules = ( Rule(SgmlLinkExtractor(), 'parse', follow=True, ), ) def parse...

Retrive multiple urls at once/in parallel

I have a python script that download web page, parse it and return some value from the page. I need to scrape a few such pages for getting the final result. Every page retrieve takes long time (5-10s) and I'd prefer to make requests in parallel to decrease wait time. The question is - which mechanism will do it quick, correctly and with ...

How do you practice web scraping?

How do you practice web scraping (for example, authentication) when you're learning? Do you: Practice with your real username and password on a real web site, and hope you don't mess up (eg fail authentication too many times, accidentally hammer the web site)? Create a fake username and account for this purpose, and hope they don't kee...

What's the best approach for parsing XML/'screen scraping' in iOS? UIWebview or NSXMLParser?

I am creating an iOS app that needs to get some data from a web page. My first though was to use NSXMLParser initWithContentsOfURL: and parse the HTML with the NSXMLParser delegate. However this approach seems like it could quickly become painful (if, for example, the HTML changed I would have to rewrite the parsing code which could be a...

Are there any websites providing free news, weather, photos APIs for using this data commercially

I want to build a service which needs to get this data from some source for further analysis. Does Google, Yahoo or someone else provides free access to this data for use in other websites using some API. I think Twitter does something like this for their data although they enforce some limits on this. The data I need is mostly for US an...

Parse html with ajax json inside

Hi I have such files to parse (from scrapping) with Python: some HTML and JS here... SomeValue = { 'calendar': [ { 's0Date': new Date(2010, 9, 12), 'values': [ { 's1Date': new Date(2010, 9, 17), 'price': 9900 }, { 's1Date': new Date(2010, 9, 18), 'price': 9900 }, ...

Need to get a div's content from multiple sites

hi I would like to grab the prices of products from newegg. heres an example site http://www.newegg.com/Product/Product.aspx?Item=**N82E16820167027** from this site, i would like to get the content of <div class="grpPricing">, that contains the price. im not very skilled at making codes, so i was searching the web for codes and used...

Data Scraping Problem

Hi, I am scraping data from facebook page for the wall posts, here is the url: http://www.facebook.com/GMHTheBook?v=wall&amp;ref=ts#!/GMHTheBook?v=wall&amp;ref=ts I sucessfully scraped all the visible wall posts using CURL. Problem: At the end of visible wall posts, there is Older Posts link which shows more wall posts once you clic...

Getting text from inside an HTML tag within a local file with grep

Possible Duplicate: RegEx match open tags except XHTML self-contained tags Excerpt From Input File <TD class="clsTDLabelWeb" width="28%">Municipality:&nbsp;</TD> <TD style="WIDTH: 394px" class="clsTDLabelSm" colSpan="5"> <span id="DInfo1_Municipality">JUPITER</span></TD> My Regular Expression (?<=<span id="DInfo1_Municipal...

Python: load text as python object

Hi! I have a such text to load: https://sites.google.com/site/iminside1/paste I'd prefer to create a python dictionary from it, but any object is OK. I tried pickle, json and eval, but didn't succeeded. Can you help me with this? Thanks! The results: a = open("the_file", "r").read() json.loads(a) ValueError: Expecting property name: li...

scraping and parsing google data like page rank and more for a domain

Hi, I need to scrape/parse some search engines related data for a given domain name(site). I need Google Page Rank (only for the domain name, not each pages). Number of indexed results/pages (google, bing). Number of Backlinks (google, bing, yahoo). Traffic Rank (alexa). Site thumbnail. Could you provide me some pointers on where...

Intelligently extracting tags from blogs and other web pages

I'm not talking about HTML tags, but tags used to describe blog posts, or youtube videos or questions on this site. If I was crawling just a single website, I'd just use an xpath to extract the tag out, or even a regex if it's simple. But I'd like to be able to throw any web page at my extract_tags() function and get the tags listed. I...

Grabbing data from table in PHP

So far this is what I have to work with: <div class="toplist"> <div class="toplist_left"></div> <div class-"toplist_body"> <div class="toplist_right"></div> <div class="toplist_body_rank">9</div> <div class="toplist_body_link"><a href="?support=details&...

Tool for render dynamic page to create HTMLs

Hello, I am looking for a tool (open source) that will crawl all my page to create a "render" of it. This is to save resources (database access). ...

Scraping ASP.Net website with POST variables in PHP

Hi everyone, For the past few days I have been trying to scrape a website but so far with no luck. The situation is as following: The website I am trying to scrape requires data from a form submitted previously. I have recognized the variables that are required by the web app and have investigated what HTTP headers are sent by the orig...

A little bit of If statements, some html investigating, and the webbrowser.

I have a code that retrieves all the "place names" and all the "addresses" separately in this link: http://www.yellowpages.ca/search/si-geo/1/sh/Ottawa,+ON I need to modify my code so that it will only retrieve the placename and address if <div class="address""> is not found within <div class="listingDetail""> class="address" is the...

Scraping a messy html website with PHP

Hi all, I am in the following situation. I am trying to convert a messy scraped html code to a nice and neat xml structure. A partial HTML code of the scraped website: <p><span class='one'>week number</span></p> <p><span class='two'>day of the week</span></p> <table class='spreadsheet'> table data </table> <p><span class='two'>anoth...

How to discover the area chart data if we only have the image?

The area chart (image) has a few data series, which are charted with different colors. We know the image size and co-ordinates of each lable on x-Axis, is it possible to discover the series of y-Axis by image recongition? Can anybody shed some light? ...