screen-scraping

Submitting queries to, and scraping results from aspx pages using python?

I am trying to get results for a batch of queries to this demographics tools page: http://adlab.microsoft.com/Demographics-Prediction/DPUI.aspx The POST action on the form calls the same page (_self) and is probably posting some event data. I read on another post here at stackoverflow that aspx pages typically need some viewstate and va...

How to read someone else's forum

Hi My friend has a forum, which is full of posts containing information. Sometimes she wants to review the posts in her forum, and come to conclusions. At the moment she reviews posts by clicking through her forum, and generates a not necessarily accurate picture of the data (in her brain) from which she makes conclusions. My thought to...

Cross platform solution for automating ncurses-type telnet sessions

Background Part of my work in networking and telco involves automating telnet sessions when legacy hardware doesn't offer easy solutions in other interfaces. Many older pieces of equipment can only be accessed via craft ports (RS-232 serial ports), SNMP, or telnet. Sometimes telnet is the only way to access specific information, however...

Navigate and scrape content from flash web app

I need a tool that I can point to a flash based website, navigate it, and check the content on given pages. I don't think I can do it with just selenium as I can't target the elements in the flash app via xpaths. Does anybody else have any ideas? ...

Headless, scriptable Firefox/Webkit on linux?

I'm looking to automate some web interactions, namely periodic download of files from a secure website. This basically involves entering my username/password and navigating to the appropriate URL. I tried simple scripting in Python, followed by more sophisticated scripting, only to discover this particular website is using some obnoxiou...

Extracting data using screenscrapers

I am looking for recommendations for a screenscraper I need to extract "Contact Us" information from certain web sites. Any ideas where I can get a good (pref free) screenscarper? ...

Web scraping with Python

I'd like to grab daily sunrise/sunset times from here. Is it possible to scrape web content with Python? what are the modules used? Is there any tutorial available? Thanks ...

Extract links from a webpage using lxml, xpath and python

I've got this xpath query: /html/body//tbody/tr[*]/td[*]/a[@title]/@href It extracts all the links with the title attribute - and gives the href in FireFox's Xpath checker add-on. However, I cannot seem to use it with lxml. from lxml import etree parsedPage = etree.HTML(page) # Create parse tree from valid page. hyperlinks = parsedP...

Get content from table with id. Regex

I need to sort a html string so I get the content I need. Now I need to loop through the tr's in a table that got an ID. I could really need some help to get this regex going. Appriciate all help I can get ...

Does any open, simply extendible web crawler exists?

I search for a web crawler solution which can is mature enough and can be simply extended. I am interested in the following features... or possibility to extend the crawler to meet them: partly just to read the feeds of several sites to scrap the content of these sites if the site has an archive I would like to crawl and index it as we...

Any thoughts on why I can't scrape a site?

Hi. I am building a site that need to scrape information from a partner site. Now my scraping code works great with other sites but not this one. It is a regular .html site. My thoughts is that it might be generated some how with php (site is build with php). I have no idea I am just taking a guess about the generated part and I would ...

Download an Entire Website in C#

Forgive my ignorance on the subject I am using string p="http://" + Textbox2.text; string r= textBox3.Text; System.Net.WebClient webclient=new System.Net.Webclient(); webclient.DownloadFile(p,r); to download a webpage. Can you please help me with enhancing the code so that it downloads the entire website. Tried using HTML Scree...

Flickr Automation For Actions Not in Available in Flickr API (Like Adding Contacts)

EDIT: I added a bounty, if someone could help me figure out what I am doing wrong, its all yours. Also, I don't really care how this gets done. If there is a library that can help out, or something of that sort that would be great. Since there is no Captcha involved, I should theoretically be able to log into Flickr and add a contac...

Capture ASP output for monitoring

How do I Capture ASP.NET output and then store it as temp memory so that I can use them in an application to do comparison. example. there's this site which has ASP output. Sorry I do not have server access, what I can do is view the output. The site by the way is a monitor for all users logged in and in which ever channel. output e....

any html/css parsing library for ruby & PHP?

I am about to finish my script that parses/scrapes website using mechanize&ruby. I need to port my script to PHP in the future. My question is if there is any library available for both ruby and php or if anybody can recommend any other approach to this? ...

Select all <p>'s from a Node's children using HTMLAgilityPack

Hey all, I've got the following code that I'm using to get a html page. Make the urls absolute and then make the links rel nofollow and open in a new window/tab. My issue is around the adding of the attributes to the <a>s. string url = "http://www.mysite.com/"; string strResult = ""; HttpWebRequest ...

How should I use HTMLAgilityPack AppendNode?

Hi all, Got a real headache at this stage on a Friday! I'm trying to add a HtmlNode to another using InsertAfter(). I can see the refChild node with id of breadcrumbs when I rpint it to the console but keep getting the following error: System.ArgumentOutOfRangeException: Node "<div id="breadcrumb"></div>" was not f ound in the collecti...

Is there a library similar to lxml or nokogiri for Java?

I want to do some screen scraping, ideally using CSS selectors and not XPath. Is there a library similar to ones in Ruby or Python? ...

Bookmarklet for ScreenScaping

http://dy-verse.blogspot.com/2009/08/screen-scraping-with-javascript-firebug.html[link text][1] outlines a strategy to parse a page and submit contents to a Google spreadsheet that depends on Greasemonkey. I'd like to adapt this approach to a simple bookmarklet where, instead of hardcoding in the page address to be parsed, i would manua...

Xampp. Php script displays a blank page!

The php script is calling four functions that scrape different websites for data. $returnData[0]=getWebsite1Data($description); $returnData[1]=getWebsite2Data($description); $returnData[2]=getWebsite3Data($description); $returnData[3]=getWebsite4Data($description); The script displays the web-page correctly if I disable the call to an...