views:

72

answers:

2

I have a long running query that returns a large data set. This query is called from a web service and the results are converted to a CSV file for the end user. Previous versions would take 10+ minutes to run and would only return results to the end user once the query completes.

I rewrote the query to where it runs in a minute or so in most cases, and rewrote the way it is accessed so the results would be streamed to the client as they came into the asp.net web service from the database server. I tested this using a local instance of SQL Server as well as a remote instance without issue.

Now, on the cusp of production deployment it seems our production SQL server machine does not send any results back to the web service until the query has completed execution. Additionally, I found another machine, that is identical to the remote server that works (clones), is also not streaming results.

The version of SQL Server 2008 is identical on all machines. The production machine has a slightly different version of Windows Server installed (6.0 vs 6.1). The production server has 4 cores and several times the RAM as the other servers. The other servers are single core with 1GB ram.

Is there any setting that would be causing this? Or is there any setting I can set that will prevent SQL Server from buffering the results?

Although I know this won't really affect the overall runtime at all, it will change the end-user perception greatly.

tl;dr; I need the results of a a query to stream to the end user as the query runs. It works with some database machines, but not on others. All machines are running the same version of SQL Server.

The gist of what I am doing in C#:

var reader = cmd.ExecuteReader();
Response.Write(getHeader());
while(reader.Read())
{
  Response.Write(getCSVForRow(reader));
  if(shouldFlush()) Response.Flush()
}

Clarification based on response below

There are 4 database servers, Local, Prod, QA1, QA2. They are all running SQL Server 2008. They all have identical databases loaded on them (more or less, 1 day lag on non-prod).

The web service is hosted on my machine (though I have tested remotely hosted as well).

The only change between tests is the connection string in the web.config.

QA2 is working (streaming), and it is a clone of QA1 (VMs). The only difference between QA1 and QA2 is an added database on QA2 not related to this query at all.

QA1 is not working.

All tests include the maximum sized dataset in the result (we limit to 5k rows at this time). The browser displays a download dialog once the first flush happens. This is the desired result. We want them to know their download is processing, even if the download speed is low and at times drops to zero (such is the way with databases).

My flushing code is simple at this time. Every k rows we flush, with k currently set to 20.

The most perplexing part of this is the fact that QA1 and QA2 behave differently. I did notice our production server is set to compatibility mode 2005 (90) where both QA and local database are set to 2008 (100). I doubt this matters. When I exec the sprocs through SSMS I have similar behavior across all machines. I see results stream in immediately.

Is there any connection string setting that could disable the streaming?

A: 
  1. Put a new task that builds this huge CSV file in a task table.
  2. Run the procedure to process this task.
  3. Wait for the result to appear in your task table with SqlDependency.
  4. Return the result to the client.
Denis Valeev
Hoping to determine why some SQL Server instances work as I expect them to and others do not.
Tim Reynolds
+1  A: 

Everything I know says that what you're doing should work; both the DataReader and Response.Write()/.Flush() act in a "streaming" fashion and will result in the client getting the data one row at a time as soon as there are rows to get. Response does include a buffer, but you're pushing the buffer to the client after every read/write iteration which minimizes its use.

I'd check that the web service is configured to respond correctly to Flush() commands from the response. Make sure the production environment is not a Win2008 Server Core installation; Windows Server 2008 does not support Response.Flush() in certain Server Core roles. I'd also check that the conditions evaluated in ShouldFlush() will return true when you expect them to in the production environment (You may be checking the app config for a value, or looking at IIS settings; I dunno).

In your test, I'd try a much larger set of sample data; it may be that the production environment is exposing problems that are also present on the test environments, but with a smaller set of test data and a high-speed Ethernet backbone, the problem isn't noticeable compared to returning hundreds of thousands of rows over DSL. You can verify that it is working in a streaming fashion by inserting a Thread.Sleep() call after each Flush(250); this'll slow down execution of the service, and let you watch the response get fed to your client at 4 rows per second.

Lastly, make sure that the client you're using in the production environment is set up to display CSV files in a fashion that allows for streaming. This basically means that a web browser acting as the client should not be configured to pass the file off to a third-party app. A web browser can easily display a text stream passed over HTTP; that's what it does, really. However, if it sees the stream as a CSV file, and it's configured to hand CSV files over to Excel to open, the browser will cache the whole file before invoking the third-party app.

KeithS
I am going to clarify in my original post based on your post. Thanks
Tim Reynolds
Marking this as the answer because, as Keith says, I am doing what I should to get things to work. We pushed to production and, for reasons I can't explain, it is working as expected in production. Wish I could explain it, but I can't.
Tim Reynolds