I'm having difficulties scraping dynamically generated table in ASPX. Trying to scrape the gas prices from a site like this GasPrices. I can extract all the information in the gas price table (address, time submitted etc.), except for the actual gas price.
Is there a way I could scrape the gas prices? i.e. somehow get a text representa...
Hi, We want to add a facebook fan page photo competition to our fan page. The meaning is that ppl can upload photo's and others can like them. The person with the most likes on his photo wins a price.
Now i was wondering if anyone knows a good idea on how to get a snapshot of all the photo's on a given moment. So that when we want to s...
I have a data aggregator that relies on scraping several sites, and indexing their information in a way that is searchable to the user.
I need to be able to scrape a vast number of pages, daily, and I have ran into problems using simple curl requests, that are fairly slow when executed in rapid sequence for a long time (the scraper runs...
I need to write a custom web-scraper to mine some data. ?I know how to submit a form using HttpWebRequest class Post method. My challenge is to loop through the resulting pages and retrieve the records from each page.
Does anyone have a code sample or article to point to? Thanks
...
Hey there,
I've heard about web scraping software that can take data from a webpage. i'm building an android app and I want to take information from this site www.menupages.ie
All I need is the names of the restaurants, and typing them in myself would be very tedious.
Can someone tell me how i'd go about doing this in eclipse, what m...
Hi,
I need to write a program to scrape forums.
Should I write the program in Python using the Scrapy framework or should I use Php cURL?
Also is there a Php equivalent to Scrapy?
Thanks
...
Hello,
I'm trying to do soemone a favour, and it's a tad outside my comfort zone, so I'm stuck.
I want to use R to scrape this page (http://www.fifa.com/worldcup/archive/germany2006/results/matches/match=97410001/report.html ) and others, to get the goal scorers and times.
So far, this is what I've got
require(RCurl)
require(XML)
th...
I have written a web scraping program to go to a list of pages and write all the html to a file. The problem is that when I pull a block of text some of the characters get written as '�'. How do I pull those characters into my text file? Here is my code:
string baseUri = String.Format("http://www.rogersmushrooms.com/gallery/loadimage...
I'm looking for a way to simulate browser resources expansion behavior.
The flow I'm trying to address is the following:
Access an initial URL (e.g. http://example.dmn/index.htm)
Parse the html response received (e.g. index.htm)
Find the resources that a browser will fetch as a result of the index parsing, e.g.:
Images
Flash
...
Hi all.
I'm looking to scrape geolocation data from LocService (a solution to track GPS pings from an Android phone) and host it in a MySQL database as a PHP cron job. The login system uses HTTPS. I'm having trouble returning anything through cURL.
Has anyone got any ideas?
Gausie
...
Dear Coding Experts,
Edit: Just for clarification I am using python, and would like to do this within python.
I am in the middle of collecting data for a research project at our university. Basically I need to scrape a lot of information from a website that moniters the European Parliament. Here is an example of how the url of one site...
Hey again all,
I have the following script so far:
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
import re
import urllib2
br = Browser()
br.open("http://www.foo.com")
html = br.response().read();
soup = BeautifulSoup(html)
items = soup.findAll(id="info")
and it runs perfectly, and results in the following ...
I am currrently using BeautifulSoup to scrape some websites, however I have a problem with some specific characters, the code inside UnicodeDammit seems to indicate this (again) are some Microsoft-invented ones.
I'm using the newest version of BeautifulSoup(3.0.8.1) as I am still using python2.5
The following code illustrates my proble...
Dear Python Experts,
I have written the following trial code to retreive the title of legislative acts from the European parliament.
import urllib2
from BeautifulSoup import BeautifulSoup
search_url = "http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A7-2010-%.4d&language=EN"
for number in xran...
So our front end GUI is getting a large overhaul to a new GWT based application. I have been working on creating the automation scripts for the old front end using cURL in some tcl/expect scripts. As I have looking at the new app I am starting to realize more and more that cURL is out of the question for performing these web interactions...
Hi All,
I am currently using watir to do a web scraping of a website hiding all data from the usual HTML source. If I am not wrong, they are using XML and those AJAX technology to hide it. Firefox can see it but it is displayed via "DOM Source of selection".
Everything works fine but now I am looking for an equivalent tool as watir but...
Dear Pythonistas,
I have the following link:
http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A7-2010-0001&language=EN
the reference part of the url has the following information:
A7 == The parliament (current is the seventh parliament, the former is A6 and so forth)
2010 == year
0001 == docu...
Hello all
Last year I made an Android application that scrapped the informations on my train company in Belgium ( application is BETrains: http://www.cyrket.com/p/android/tof.cv.mpp/)
This application was really cool and allowed users to talk with other people in the train ( a messagery server is runned by me) and the conversations wre...
I am trying to access data of http://www.bbb.org/us/Find-Business-Reviews/ with cURL. Now I used HTTPFox to see what data does this site send and accordingly made an array to "POST" to the page. But I am having problem in accessing Page 2,3,4,5...
Here is the array -
$array = Array();
$array['__EVENTTARGET'] = 'ctl12$gc1$s$gridResult...
HI,
I want to extract data from a website but it uses some strange javascript so I can't get the job done with cURL. I want to know is there anything like virtual browser which opens up the page and I can initiate click on some buttons?
If not is there any executable program to achieve this task via command line?
...