views:

72

answers:

5

Hi,

I need to get the final url after a page redirect preferrably with curl or wget.

For example http://google.com may redirect to http://www.google.com.

The contents are easy to get(ex. curl --max-redirs 10 http://google.com -L), but I'm only interested in the final url (in the former case http://www.google.com).

Is there any way of doing this by using only linux buildin tools? (command line only)

+1  A: 

You could use grep. doesn't wget tell you where it's redirecting too? Just grep that out.

SpliFF
A: 

I'm not sure how to do it with curl, but libwww-perl installs the GET alias.

$ GET -S -d -e http://google.com
GET http://google.com --> 301 Moved Permanently
GET http://www.google.com/ --> 302 Found
GET http://www.google.ca/ --> 200 OK
Cache-Control: private, max-age=0
Connection: close
Date: Sat, 19 Jun 2010 04:11:01 GMT
Server: gws
Content-Type: text/html; charset=ISO-8859-1
Expires: -1
Client-Date: Sat, 19 Jun 2010 04:11:01 GMT
Client-Peer: 74.125.155.105:80
Client-Response-Num: 1
Set-Cookie: PREF=ID=a1925ca9f8af11b9:TM=1276920661:LM=1276920661:S=ULFrHqOiFDDzDVFB; expires=Mon, 18-Jun-2012 04:11:01 GMT; path=/; domain=.google.ca
Title: Google
X-XSS-Protection: 1; mode=block
halkeye
+1  A: 

as another option:

$ curl -i http://google.com
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Sat, 19 Jun 2010 04:15:10 GMT
Expires: Mon, 19 Jul 2010 04:15:10 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/"&gt;here&lt;/A&gt;.
</BODY></HTML>

But it doesn't go past the first one.

halkeye
A: 

Thank you. I ended up implementing your suggestions: curl -i + grep

curl -i http://google.com -L | egrep -A 10 '301 Moved Permanently|302 Found' | grep 'Location' | awk -F': ' '{print $2}' | tail -1

Returns blank if the website doesn't redirect, but that's good enough for me as it works on consecutive redirections.

Could be buggy, but at a glance it works ok.

vise
+2  A: 

curl's -w option and the sub variable 'url_effective' is what you're looking for.

something like:

curl [URL] -L -o dumpfile -w 'Last URL was: %{url_effective}'

Daniel Stenberg
This is deffinetly better than the abomination I wrote. Too bad it creates a temporary file though, but I guess I can always add "; rm -f dumpfile".
vise
you should be able to use "-o /dev/null" if you don't want the file
halkeye
That's a great option, I never knew curl could do that! It never ceases to amaze me `:-)`
Josh