I am trying to get results for a batch of queries to this demographics tools page: http://adlab.microsoft.com/Demographics-Prediction/DPUI.aspx
The POST action on the form calls the same page (_self) and is probably posting some event data. I read on another post here at stackoverflow that aspx pages typically need some viewstate and va...
Hi
My friend has a forum, which is full of posts containing information. Sometimes she wants to review the posts in her forum, and come to conclusions. At the moment she reviews posts by clicking through her forum, and generates a not necessarily accurate picture of the data (in her brain) from which she makes conclusions. My thought to...
Background
Part of my work in networking and telco involves automating telnet sessions when legacy hardware doesn't offer easy solutions in other interfaces. Many older pieces of equipment can only be accessed via craft ports (RS-232 serial ports), SNMP, or telnet. Sometimes telnet is the only way to access specific information, however...
I need a tool that I can point to a flash based website, navigate it, and check the content on given pages.
I don't think I can do it with just selenium as I can't target the elements in the flash app via xpaths.
Does anybody else have any ideas?
...
I'm looking to automate some web interactions, namely periodic download of files from a secure website. This basically involves entering my username/password and navigating to the appropriate URL.
I tried simple scripting in Python, followed by more sophisticated scripting, only to discover this particular website is using some obnoxiou...
I am looking for recommendations for a screenscraper I need to extract "Contact Us" information from certain web sites.
Any ideas where I can get a good (pref free) screenscarper?
...
I'd like to grab daily sunrise/sunset times from here. Is it possible to scrape web content with Python? what are the modules used? Is there any tutorial available?
Thanks
...
I've got this xpath query:
/html/body//tbody/tr[*]/td[*]/a[@title]/@href
It extracts all the links with the title attribute - and gives the href in FireFox's Xpath checker add-on.
However, I cannot seem to use it with lxml.
from lxml import etree
parsedPage = etree.HTML(page) # Create parse tree from valid page.
hyperlinks = parsedP...
I need to sort a html string so I get the content I need. Now I need to loop through the tr's in a table that got an ID. I could really need some help to get this regex going.
Appriciate all help I can get
...
I search for a web crawler solution which can is mature enough and can be simply extended. I am interested in the following features... or possibility to extend the crawler to meet them:
partly just to read the feeds of several sites
to scrap the content of these sites
if the site has an archive I would like to crawl and index it as we...
Hi. I am building a site that need to scrape information from a partner site. Now my scraping code works great with other sites but not this one. It is a regular .html site. My thoughts is that it might be generated some how with php (site is build with php).
I have no idea I am just taking a guess about the generated part and I would ...
Forgive my ignorance on the subject
I am using
string p="http://" + Textbox2.text;
string r= textBox3.Text;
System.Net.WebClient webclient=new
System.Net.Webclient();
webclient.DownloadFile(p,r);
to download a webpage. Can you please help me with enhancing the code so that it downloads the entire website. Tried using HTML Scree...
EDIT: I added a bounty, if someone could help me figure out what I am doing wrong, its all yours.
Also, I don't really care how this gets done. If there is a library that can help out, or something of that sort that would be great.
Since there is no Captcha involved, I should theoretically be able to log into Flickr and add a contac...
How do I Capture ASP.NET output and then store it as temp memory so that I can use them in an application to do comparison.
example.
there's this site which has ASP output. Sorry I do not have server access, what I can do is view the output.
The site by the way is a monitor for all users logged in and in which ever channel.
output e....
I am about to finish my script that parses/scrapes website using mechanize&ruby.
I need to port my script to PHP in the future.
My question is
if there is any library available for both ruby and php or
if anybody can recommend any other approach to this?
...
Hey all,
I've got the following code that I'm using to get a html page. Make the urls absolute and then make the links rel nofollow and open in a new window/tab. My issue is around the adding of the attributes to the <a>s.
string url = "http://www.mysite.com/";
string strResult = "";
HttpWebRequest ...
Hi all,
Got a real headache at this stage on a Friday! I'm trying to add a HtmlNode to another using InsertAfter(). I can see the refChild node with id of breadcrumbs when I rpint it to the console but keep getting the following error:
System.ArgumentOutOfRangeException: Node "<div id="breadcrumb"></div>" was not f
ound in the collecti...
I want to do some screen scraping, ideally using CSS selectors and not XPath. Is there a library similar to ones in Ruby or Python?
...
http://dy-verse.blogspot.com/2009/08/screen-scraping-with-javascript-firebug.html[link text][1]
outlines a strategy to parse a page and submit contents to a Google spreadsheet that depends on Greasemonkey. I'd like to adapt this approach to a simple bookmarklet where, instead of hardcoding in the page address to be parsed, i would manua...
The php script is calling four functions that scrape different websites for data.
$returnData[0]=getWebsite1Data($description);
$returnData[1]=getWebsite2Data($description);
$returnData[2]=getWebsite3Data($description);
$returnData[3]=getWebsite4Data($description);
The script displays the web-page correctly if I disable the call to an...