Crawling itunes.apple.com | ansaurus

tags:

views:

41

answers:

1

Q:

Crawling itunes.apple.com

Hi I am trying to crawl the apple itunes website. I am getting output in binary format. For example

curl -A "mozilla/5.0" http://itunes.apple.com/us/app/the-far-islands-by-john-buchan/id327765949?mt=8

returns binary.

Can anybody please tell me what i am missing?

Thanks

A:

You're getting binary back because the page you cited isn't returning HTML/XML, it's returning an Apple WebObject. From wget:

wget http://itunes.apple.com/us/app/the-far-islands-by-john-buchan/id327765949?mt=8
--2010-08-03 12:38:14--  http://itunes.apple.com/us/app/the-far-islands-by-john-buchan/id327765949?mt=8
Resolving itunes.apple.com... 17.250.237.16
Connecting to itunes.apple.com|17.250.237.16|:80... connected.
HTTP request sent, awaiting response... 200 Apple WebObjects
Length: 22900 (22K) [text/html]
Saving to: `id327765949?mt=8'

100%[======================================>] 22,900      --.-K/s   in 0.05s   

2010-08-03 12:38:14 (440 KB/s) - `id327765949?mt=8' saved [22900/22900]

See the good old Wikipedia for more info, but if you want to crawl it, you may need to use something that simulates a browser and thus can interpret it - maybe watir would work.

Chris Bunch 2010-08-03 19:39:37

related questions

Can't access website via cURL from localhost, but can from hosted server.

Curl post data and headers only

PHP app using Twitter API works on some accounts, not others

Is it possible to compile libCurl with SSH support using vc8?

How do I make php wait for curl to finish before continuing?

Why does session_start cause a timeout when one script calls another script using curl

What $_POST[] do i need to post to a forum?

PHP: how to save cookies for remote web pages ?

PHP4: Send XML over HTTPS/POST via cURL?

How to manage a simple PHP session using C++ cURL (libcurl)

Building libcurl with SSL support on Windows

How do I install cURL on Windows?

php cURL iis 6.0 windows server 2003

How to install PHP/CURL?

"CURLE_OUT_OF_MEMORY" error when posting via https

cURL equivalent in JAVA

PHP :: Emulate <form method="post">, forwarding user to page

cURL in PHP returns different data in _FILE and _RETURNTRANSFER

is there a curl/wget option that says not to save files upon http errors?

Curl command line for consuming webServices?

What is cURL good for ?

Passing $_POST values with cURL

cURL adding whitespace to post content?

PHP / cURL on Windows install: "The specified module could not be found."

How to curl or wget a web page?