ansaurus

Question

Answer 1

+3 A:

Nice idea :)

I presume you've used wget's

--load-cookies (filename)

might help a little but it might be easier to use something like Mechanize (in Perl or python) to mimic a browser more fully to get a good spider.

sparkes 2008-08-05 20:43:31

Answer 2

+5 A:

Your status page is available now without logging in (click logout and try it). When the beta-cookie is disabled, there will be nothing between you and your status page.

Grant 2008-08-05 20:43:52

Answer 3

+2 A:

I couldn't figure out how to get the cookies to work either, but I was able to get to my status page in my browser while I was logged out, so I assume this will work once stackoverflow goes public.

This is an interesting idea, but won't you also pick up diffs of the underlying html code? Do you have a strategy to avoid ending up with a diff of the html and not the actual content?

Ryan Ahearn 2008-08-05 20:46:22

Answer 4

+2 A:

@Ryan:

If I had the time, I would make a Beautiful Soup (or something better?) script to scrape the data nicely, but for now I'm just grepping out the lines of text I need.

Mark Harrison 2008-08-05 21:06:36

Answer 5

+2 A:

And here's what works...

curl -s --cookie soba=. http://stackoverflow.com/users

Mark Harrison 2008-08-05 21:22:42

Answer 6

+5 A:

From Mark Harrison

And here's what works...

curl -s --cookie soba=. http://stackoverflow.com/users

And for wget:

wget --no-cookies --header "Cookie: soba=(LookItUpYourself)" http://stackoverflow.com/users/30/myProfile.html

Grant 2008-08-05 22:04:12

Answer 7

A:

@Mark

Thanks, I had never heard of Beautiful Soup.

Ryan Ahearn 2008-08-06 13:31:30

ansaurus

tags:

views:

answers:

How to curl or wget a web page?

related questions