This question is kinda old, but perhaps this is of any use:
When using native queries, try to set a breakpoint on the lambda expression. If the breakpoint is actually invoked, you're in trouble because the optimization failed. To invoke the lambda, each of the objects will have to be instantiated which is very costly.
If optimization worked, the lambda expression tree will be analyzed and the actual code won't be needed, thus breakpoints won't be triggered.
Also note that settings indexes on fields must be performed before opening the connection.
Last, I have a test case of simple objects. When I started without query optimization and indexing (and worse, using a server that was forced to use the GenericReflector
because I failed to provide the model .dlls), it too 600s for a three-criteria query on about 100,000 objects. Now it takes 6s for the same query on 2.5M objects so there is really a HUGE gain.