views:

282

answers:

2

I noticed that iTunes preview allows you to crawl and scrape pages via the http:// protocol. However, many of the links are trying to be opened in iTunes rather than the browser. For example, when you go to the iBooks page, it immediately tries opening a url with an itms:// protocol.

Are there any other methods of crawling the App Store or is this the only way?

Can the itms:// protocol links themselves be crawled somehow?

+1  A: 

The only difference between http:// links and itms:// links is that you need to set your User-Agent to an iTunes user-agent, and depending on the version you may also have to include a verification code based on some not-so-secret algorithm.

For example this is the code for iTunes 9:

# Some magic. Generates a seed we use for X-Apple-Validation. Adapted from LWP::UserAgent::iTMS_Client.
function comp_seed($url, $user_agent) {
    $random  = sprintf( "%04X%04X", rand(0,0x10000), rand(0,0x10000) );
    $static  = base64_decode("ROkjAaKid4EUF5kGtTNn3Q==");
    $url_end = ( preg_match("|.*/.*/.*(/.+)$|",$url,$matches)) ? $matches[1] : '?';
    $digest  = md5(join("",array($url_end, $user_agent, $static, $random)) );
    return $random . '-' . strtoupper($digest);
}

However if you are only scraping, iTunes preview should work for your purposes, the link you gave us to the iBooks page had more than enough information to scrape.

Adam M-W
+1  A: 

I would have a decent look at the iTunes Search API and the iTunes Enterprise Partner API

You might get most/all of the information you need in a nice JSON file format.

If you can't get the information you need with the API, I would be interested what it is :)

philipp