Hello,
given the URL of a well known company (eg http://mcdonalds.com/), how would you automatically and reliably find the company name (in this case "Mc Donalds")?
Thanks
Edit: someone voted to close this question, so maybe I need to explain the motivation. I have a large list of company URLs and I want to find data about each compan...
I want to write a Java func grabTopResults(String f) such that grabTopResults("automata theory") returns me a list of the top 100 cited papers on scholar.google.com for "automata theory".
Does anyone have suggestions for what libraries will make my life easy?
Thanks!
...
Hello,
I am trying to scrap some data from a website.
The scripts I am trying to write, should get the content of the page:
http://www.atpworldtour.com/Rankings/Singles.aspx
Should simulate the user going trough every option for Additional Standings and the dates and simulate clicking on Go then after fetching the data should use the...
Hi To All,
I am wanting to automate the input of post variables on a login page for the purpose of webscraping. It would improve the process no end if I can get past the login page.
Then I can schedule some functions to run on cycle automatically. (Had a go with some CURL commands but could not get the result)
Thanks for any help,...
I am looking for developing a Web Scrapper (in C# windows forms).The whole idea which i am trying to accomplish is as follows.
Get the URL from the User .
Load the Web page , in the IE UI control(embeddeed browser) in WINForms.
Allow the User to select a text (contiguous , small(not exceeding 50 chars)). from the loaded web page.
When ...
I wrote a program that takes in a partial rss feed and outputs a full one, but it is one a case by case basis. The recipe for one site is not the same as the recipe for the other. So what I do is look at the domain basename(for instance nyt or wsj) and choose a module based on that. Though I need to load each and every module before h...
ok ive got this code:
public static string ScreenScrape(string url)
{
System.Net.WebRequest request = System.Net.WebRequest.Create(url);
// set properties of the request
using (System.Net.WebResponse response = request.GetResponse())
{
using (System.IO.StreamReader reader = new System.IO.S...
Amazon exposes RSS feeds for new products with a certain tag, such as http://www.amazon.com/rss/tag/blu-ray/new
They also expose new popular products with http://www.amazon.com/gp/new-releases/books
Is there a way to get a feed of all new products, regardless of tag and popularity?
...
I am trying to find a list of what ISBNs are in use. I guess I could scrape a website like Amazon but that would waste a lot of bandwidth. Is there a better (free) way?
...
I'm currently trying to scrape a website that has fairly poorly-formatted HTML (often missing closing tags, no use of classes or ids so it's incredibly difficult to go straight to the element you want, etc.). I've been using BeautifulSoup with some success so far but every once and a while (though quite rarely), I run into a page where ...
When there is no webservice API available, your only option might be to Screen Scrape, but how do you do it in c#?
how do you think of doing it?
...
I am scrapping data from web site using my java application and want to display the result after parsing code of html page in a Text Area made in Swing.
Text like: hello <b>every</b>one should be displayed as: 'hello everyone' in text area.
Thanks!!
...
Hi all,
I have a (somewhat complex) web scraping challenge that I wish to accomplish and would love for some direction (to whatever level you feel like sharing) here goes:
I would like to go through all the "species pages" present in this link:
http://gtrnadb.ucsc.edu/
So for each of them I will go to:
The species page link (for ex...
I want scrapy to crawl pages where going to the next one link looks like this:
Next
Will scrapy be able to interpret javascript code of that?
With livehttpheaders extension I found out that clicking Next generates a POST with a really huge piece of "garbage" starting like this: encoded_session_hidden_map=H4sIAAAAAAAAALWZXWwj1RXHJ9n
...
I'm attempting to write a screen scraper for Digikey that will allow our company to keep accurate track of pricing, part availability and product replacements when a part is discontinued. There seems to be a discrepancy between the XPATH that I'm seeing in Chrome Devtools as well as Firebug on Firefox and what my C# program is seeing.
...
I'm trying to pullout some info from an external site using jQuery and Adobe AIR. Right now I'm using a hidden div and jQuery's load function to load fragments of the external site, once the info is loaded I parse some info with selectors. This is fine but it's kinda dirty and I need to perform this several times (don't want to need many...
I'm using PHP to scrape a website and collect some data. It's all done without using regex. I'm using php's explode() method to find particular HTML tags instead.
It is possible that if the structure of the website changes (CSS, HTML), then wrong data may be collected by the scraper. So the question is - how do I know if the HTML struct...
As gmail and the task api is not available everywhere (eg: some companies block gmail but not calendar), is there a way to scrap google task through the calendar web interface ?
I did a userscript like the one below, but I find it too brittle :
// List of div to hide
idlist = [
'gbar',
'logo-container',
...
];
// Hiding b...
We are using a web scraper and have it set up to have a sleep function which has a random function set up (so that it isn't the same time between each scrape) but we are still getting blocked from Yahoo after 20-30 requests.
Does any one know if there is a limit (i.e: 20 requests per minutes, 200 an hour) Right now our average between ...
Can anyone suggest a good source of names that I can use to help analyze some tables on web pages. The first column of the tables I am scraping have names alone, names and titles or just titles. The names can be as varied as John Smith to Vikram Saksena. I have been poking around for a compiled list of words that can be found in proper...