Hi,
I am using XPath to query HTML sites, which works pretty good so far, but now I hit a (brick)wall and can't find a solution :-)
The html looks like this:
<ul>
<li><a href="">Text1<span>AnotherText1</span></a></li>
<li><a href="">Text2<span>AnotherText2</span></a></li>
<li><a href="">Text3<span>AnotherText3</span></a></li>
</ul>
...
I have found a post http://stackoverflow.com/questions/999056/ethics-of-robots-txt/999088#999088 discussing a matter of robots.txt on web sites. Generally, I agree with the principals. However, there are commercial tools checking Google positions by - very likely - scraping Google for results, due to lack of API (in case someone doesn't ...
I have started developing a webpage and recently hired someone to write code to display a customized feed (powered by API) in the middle panel on http://farmball.com/. Note that this is not the RSS feed tied to the site blog. The feed ties to my account on another site. There is no RSS link for an average user to subscribe to the feed. I...
So, often, I check my accounts for different numbers. For example, my affiliate accounts- i check for cash increase. I want to program a script where it can login to all these websiets and then grab the money value for me and display it on one page. How can I program this?
...
Hello,
I'm using a C# WebClient to post login details to a page and read the all the results.
The page I am trying to load includes flash (which, in the browser, translates into HTML). I'm guessing it's flash to avoid being picked up by search engines???
The flash I am interested in is just text (not an image/video) etc and when I "V...
Hello,
I am scraping some data and I want to get the the value of an element after a specific tag with value.
It's a bold tag with value 'Types:'.
<b>Types:</b>
Once I get to that element I can use Prototype's Element.next() to get the data I want.
How exactly do I do this?
I have been fiddling with $$ but can't seem to get it righ...
I'm writing an app that takes in HTML code of a page and extracts certain elements (such as tables) of the page and returns the html code for those elements. I'm attempting to do this in java using the Mozilla parser to simplify the navigation through the page, but I'm having trouble extracting the html code needed.
Maybe my whole appr...
Problem
When screen-scraping a webpage using python one has to know the character encoding of the page. If you get the character encoding wrong than your output will be messed up.
People usually use some rudimentary technique to detect the encoding. They either use the charset from the header or the charset defined in the meta tag or t...
We have a legacy system that is essentially a glorified telnet interface.
We cannot use an alternative telnet client program to connect to the system since there are special features built into the client software they have provided us.
I want to be able to screen scrape from this program, however that's proving very difficult.
I have ...
I'm trying to scrape some information from a web site, but am having trouble reading the relevant pages. The pages seem to first send a basic setup, then more detailed info. My download attempts only seem to capture the basic setup. I've tried urllib and mechanize so far.
Firefox and Chrome have no trouble displaying the pages, altho...
I'm trying to scrape some pages, from a list on a text file, from a domain and save them onto my server.
I have the following code (with the domain obscured), culling from a text file list of the file directories, and then copying the file names, but with .html appended.
For some reason, its creating the files without actually success...
I am writing a program that an HTML scraper that when it grabs the HTML from the page, it returns the HTML, and I want to Grab words that are All Capital letters, and then stores these words into a database. My problem right now is I cannot right the algorithm to parse each line of the HTML I got back in order to store the words. This is...
How to fetch the contents of meta name="description" content="....." with Scrubyt ?
require 'rubygems'
require 'scrubyt'
data = Scrubyt::Extractor.define do
fetch 'http://www.allegro.pl/'
head '//head' do
description '//meta[@name="description"]'
end
end
puts data.to_xml
What is the the correct way ?
...
In my applications I always end up implementing a Model-View-Presenter pattern and usually end up scrapping my View object from the screen with a get property.
For example
Person IBasicRegistration.Person
{
get
{
if (ViewState["View.Person"] == null)
ViewState["View.Person"] = new Person();
var Person = (Person) ViewState["Vi...
Ok still new to the screen scraping thing.
I've managed to log into the site I need but now how do I redirect to another page?
After I login I'm trying to do another GET request on the page that I need but it has a redirect on it that takes me back to the login page.
So I'm thinking the SESSION variables are not being passed, how can ...
I want to know technique to capture screenshot if I have a url list of those sites like google fastflip. What technology or techniques require for this kind of task. If this technique available in rails it would be great.
Thanks
...
Essentially I have an img tag with a src attribute of /ChartImg.axd?i=chart_0_0.png&g=06469eea67ea452b977f8e73cad70691. Do I need to create another WebRequest to get the content of this resource or is there a simpler way?
I am scraping the output of the current request. Below is what I've got so far...
Essentially my additionaAssets ...
My plan is to have a user write down a movie title in my program and my program will pull the appropiate information asynchronously so the UI doesn't freeze up.
Here's the code:
public class IMDB
{
WebClient WebClientX = new WebClient();
byte[] Buffer = null;
public string[] SearchForMovie(string SearchPar...
I'll post my entire class and maybe someone with MUCH more experience can help me design something better. I'm really new to doing things Asynchronously, so I'm really lost here. Hopefully my design isn't TOO bad. :P
IMDB Class:
public class IMDB
{
WebClient WebClientX = new WebClient();
byte[] Buffer = null;
public string...
I am attempting to get some information from a website, the info that I need is located on the missouri.edu site (so it's publicly available).
Here is the process that I need to accomplish:
- Navigate to https://webapps.missouri.edu/ODDSearchEngine/oddsearch
- search for a department name like "business"
- Click any of the department nam...