I would only add a couple things to Rob's answer:
First, Make sure the amount of data involved in the test cases is similiar to production values. In other words if your queries are normally against tables with hundreds of thousands or rows, then create such a test environment.
Second, make everything else equal except for the use of an nHibernate generated query and a s'proc call. Hopefully you can execute the test by simply swapping out a provider.
Finally, realize that there is usually a lot more at stake than just stored procedures vs. ORM. With that in mind the test should look at all of the factors: execution time, memory consumption, scalability, debugging ability, etc.