views:

245

answers:

2

How do I get a a complete list of all the urls that my rails application could generate?

I don't want the routes that I get get form rake routes, instead I want to get the actul URLs corrosponding to all the dynmically generated pages in my application...

Is this even possible?

(Background: I'm doing this because I want a complete list of URLs for some load testing I want to do, which has to cover the entire breadth of the application)

+1  A: 

You could pretty quickly hack together a program that grabs the output of rake routes and then parses the output to put together a list of the URLs.

What I have, typically, done for load testing is to use a tool like WebLOAD and script several different types of user sessions (or different routes users can take). Then I create a mix of user sessions and run them through the website to get something close to an accurate picture of how the site might run.

Typically I will also do this on a total of 4 different machines running about 80 concurrent user sessions to realistically simulate what will be happening through the application. This also makes sure I don't spend overly much time optimizing infrequently visited pages and can, instead, concentrate on overall application performance along the critical paths.

Jeremiah Peschka
+5  A: 

I was able to produce useful output with the following command:

$ wget --spider -r -nv -nd -np http://localhost:3209/ 2>&1 | ack -o '(?<=URL:)\S+'
http://localhost:3209/
http://localhost:3209/robots.txt
http://localhost:3209/agenda/2008/08
http://localhost:3209/agenda/2008/10
http://localhost:3209/agenda/2008/09/01
http://localhost:3209/agenda/2008/09/02
http://localhost:3209/agenda/2008/09/03
^C

A quick reference of the wget arguments:

# --spider                  don't download anything.
# -r,  --recursive          specify recursive download.
# -nv, --no-verbose         turn off verboseness, without being quiet.
# -nd, --no-directories     don't create directories.
# -np, --no-parent          don't ascend to the parent directory.

About ack

ack is like grep but use perl regexps, which are more complete/powerful.

-o tells ack to only output the matched substring, and the pattern I used looks for anything non-space preceded by 'URL:'

kch
This is an awesome solution. It does exactly what I need. Thanks. Sort of surprising that there isn't a simpler way to simply spider a site, though.
Pistos