I dunno why but I'd like to cite Remus Rusanu [1]:
First of all, you need to run the query repeatedly under each [censored] and average the result, discarding the one with the maximum time. This will eliminate the buffer warm up impact: you want all runs to be on a warm cache, not have one query warm the cache and pay the penalty in comparison.
Next, you need to make sure you measure under realistic concurrency scenario. IF you will have updates/inserts/deletes occur under real life, then you must add them to your test, since they will impact tremendously the reads under various isolation level. The last thing you want is to conclude 'serializable reads are fastest, lets use them everywhere' and then watch the system melt down in production because everything is serialized.
1) Running the query on a cold cache is not accurate. Your production queries will not run on a cold cache, you'll be optimizing an unrealistic scenario and you don't measure the query, you are really measuring the disk read throughput. You need to measure the performance on a warm cache as well, and keep track of both (cold run time, warm run times).
How relevant is the cache for a large query (millions of rows) that under normal circumstances runs only once for particular data?
Still very relevant. Even if the data is so large that it never fits in memory and each run has to re-read every page of the table, there is still the caching of non-leaf pages (ie. hot pages in the table, root or near root), cache of narrower non-clustered indexes, cache of table metadata. Don't think at your table as an ISAM file
[1] Why better isolation level means better performance in SQL Server
http://stackoverflow.com/questions/2450808/why-better-isolation-level-means-better-performance-in-sql-server