tags:

views:

100

answers:

1

I was inspired by another question to write a script (or rather a one-liner) to grab random Wikipedia pages.

Here's what I've got so far:

# Grab the HTTP header response from Wikipedia's random page link
curl 'http://en.wikipedia.org/wiki/Special:Random' -sI

# Search STDIN for the Location header and grab its content
perl -wnl -e '/Location: (.*)/ and print $1;'

And this works. It outputs a random Wikipedia URL to the console. But I need to append "?printable=yes" to that url to get the Wikipedia page without all the non-article content.

However, running:

curl 'http://en.wikipedia.org/wiki/Special:Random' -sI | perl -wnl -e '/Location: (.*)/ and print $1 . "?printable=yes";'

Outputs: ?printable=yespedia.org/wiki/James_Keene_(footballer)

Why is my concatenation not concatenating?

UPDATE:

For the curious, here is the one-liner in its completion:

curl `curl 'http://en.wikipedia.org/wiki/Special:Random' -sI | perl -wnl -e '/Location: ([^\r]*)/ and print $1 . "?printable=yes";'`
+2  A: 
curl 'http://en.wikipedia.org/wiki/Special:Random' -sI | perl -wnl -e '/Location: (.*)/ and chomp($1) and print $1 . "?printable=yes";'

Untested, but this should work. The return to the beginning of the line is caused by a rogue '\r' character at the end of the Location line. The script is printing the Wikipedia URL, complete with '\r' which returns to the beginning of the line, where it then continues to print ?printable=yes. Chomp will remove that '\r' character.

Nick Lewis
Somehow all that managed to do is prepend a 0 to the output. However, you were right about the reason and changing the regex to /Location: ([^\r]*)/ did the trick. Thanks.
Daniel Straight
Well that 0 would be because I didn't bother to read how chomp is used in Perl; been using Ruby lately, so I assumed it was the same. What it was doing was printing $1, chomping nothing (which returns 0 because it removed 0 characters) and concatenating and thus printing that, and then printing the rest. I've updated my answer to have the correct usage of chomp, although what you did works just as well. :)
Nick Lewis
Damn I hate the CRLF approach in Windows. I spent almost half an hour debugging a multi-line regex because of '\r's just yesterday.
Martinho Fernandes
@Martinho: The CRLF at the end of the response lines has nothing to do with Windows ... see http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html
Sinan Ünür
Anchoring the regex and capturing only what you needed would have avoided the problem.
Sinan Ünür