tags:

views:

56

answers:

1

I am trying to use the getURL function of RCurl Package in order to access an ASP Webpage as:

my_url <- "http://www.my_site.org/my_site/main.asp?ID=11&amp;REFID=33"
webpage <- getURL(my_url)

but I get an Object Moved redirection error message like:

    "<head><title>Object moved</title></head>\n<body><h1>Object Moved</h1>
This object may be found <a HREF=\"/my_site/index.asp\">here</a>.</body>\n"

I followed various suggestions like using the curlEscape URL encoding function or by setting the CURLOPT_FOLLOWLOCATION and CCURLOPT_SSL_VERIFYHOST Parameters via the curlSetOpt Function as listed in the php ssl curl : object moved error link, but the later 2 were not recognized as valid RCurl options.

Any suggestions how to overcome the issue?

+1  A: 

Use the followlocation curl option:

getURL(u,.opts=curlOptions(followlocation=TRUE))

with added cookiefile goodness - its supposed to be a file that doesnt exist, but I'm not sure how you can be sure of that:

w=getURL(u,.opts=curlOptions(followlocation=TRUE,cookiefile="nosuchfile"))
Spacedman
Thank you for your prompt answer. I used the followlocation option, but I appears that getURL isn't executed at all. I left it's execution for almost over 5 min and still there hasn't been any kind of progress. Maybe some other curloption is needed?
What happens if you getURL the URL given in the redirect?ie getURL("http://www.my_site.org/my_site/index.asp")?I assume your PC can connect to that web site okay.Try getting the command-line Curl client and seeing if that connects, then add the complication of doing it in R.
Spacedman
@Spacedman,the command-line curl execution of the redirection leads after two more URL redirection outlines (with the appropriate "Object Moved" redirection message) to a point where the ASP REFID calling parameter is not recognized as an internal or external command
Sounds impossible to get any further with this without access to the server.
Spacedman
Is there no way you can give us access to that server? Or can you find a public server that exhibits the same problem?
Spacedman
It seems to go into a mad loop over three redirects. The first redirect is also sending back some cookie info, which I imagine getURL isn't pushing back to the server on the redirect. curl on the command line does this, so it's not an R problem any more. It works on a web browser because the browser probably sends back the cookie info.The docs for curl say to use the -b option, which works on the command line. Not sure how to get this into R though..
Spacedman
Indeed the curl cookie option "-L -b empty.txt" delivers some information after the loop. Nevertheless the 2 included ASP parameters won't be taken into account.