views:

223

answers:

2

I would like to make a small program that scrapes information from a 3rd-party GWT-enabled website. Is it possible to somehow call this RPC directly? What would I need to reverse-engineer to do this (i.e. hopefully not the entire low-level protocol).

I am hoping I could somehow just call this from within my own server-side GWT or Servlet app.

A: 

I'm pretty sure GWT makes it difficult to do cross-site requests, for the sake of security.

Any solution you come up with will probably be hacky (and not flexible to future changes), and since you're presumably doing it without the consent of the site in question, probably a bad idea to begin with.

Is there some reason you can't ask the site to publish their data using a REST API?

Jason Hall
This is for the Developer Dashboard on the Google Market. They don't mind me scraping it, but they don't have time to make a REST API. There is no robots.txt for this site, so scraping it should be absolutely okay (otherwise the search engines would be in trouble).
Artem
There's a big difference between scraping and trying to make GWT-RPC requests to their services. GWT makes it hard to make a GWT-RPC service public, for security reasons. And when (not if) they change the API of these calls, your app will break. It *may* be possible, but it won't be a good idea.
Jason Hall
I am sorry, I disagree there is a big difference. Google spiders routinely submit automated queries to all sorts of CGI backends and that's how they crawl the deep web. http://glinden.blogspot.com/2007/03/google-and-deep-web.html Please point me to some specific sources that say GWT-RPC is not to be contacted. That, and scraping is always possible through GreaseMonkey on Firefox. It's just a matter of doing it an easier way.
Artem
A: 

It is possible.. but you would have to go through their code a bit to understand how the serialization/deserialization works.

Classes of interest are

  1. RPC.java
  2. ClientSerializationStreamWriter -> ServerSerializationStreamReader are the classes involved in making a GWT request.
  3. ServerSerializationStreamWriter -> ClientSerializationStreamReader are the classes involved in creating and interpreting a responose.

I am trying out a similar thing as an academic project, will add more information if I am able to decipher these classes

sri
But he doesn't have access to their code. He's just trying to make GWT-RPC requests against a service running on their server.
Jason Hall
Jason, that would be general GWT code. Sri, that sounds great, please keep up the research.
Artem