Best practice for WCF service with large amounts of data?

We have a WCF service that is used for querying an underlying data store (SQL Server 2005 right now). This service may return rather large amounts of data; 60000+ instances of our entity class that contains ~20 properties. The properties are mostly primitives such as string, int, DateTime with a couple pointing at other entities that may in turn point at others; those hierarchies are not very deep though.

One application that is consuming this service will typically make queries that return only a reasonable number of entities (from just a few instances up to a couple of thousand). But occasionally it will make a query that will return a large amount as stated above (and it will need to process that data, so narrowing the query criteria is not an option).

What we want to do is to introduce some sort of "paging" functionality, where the client can call the service and get a certain number of instances back, then call again and get the next chunk and so on, until the full result is fetched. Not having worked an awful lot with WCF, I am not quite certain of the best way to achieve this.

One thing to perhaps keep in mind is that the underlying data may very well change while fetching the chunks. I am not quite sure if this is a problem for us or not (need to invesigat that a bit), but it could be, so any input on handling that particular situation is also welcome.

We have started to look into streaming the response, but would like to see samples of paging as well, since we may want to start processing data before the full result is recieved.

So, the question in short: is there a best practice for this kind of scenario (or any absolute no-no's that we should be aware of)?

Using a streaming binding configuration on client and server with a MessageContract having only a Stream [MessageBodyMember] (and any other metadata sent as [MessageHeader]s) would let you do the whole thing in one call without worrying about paging (just use an enumerator on the server side to feed the stream and process individual entities as they appear on the client), but you'd have to roll your own framing within the stream (eg, serialize/deserialize entities manually on the stream with DataContractSerializer or whatever). I've done this, and it works great, but it's kind of a pain.

If you want to do paging, the easy way is to use a sessionful WCF channel in conjunction with a snapshot transaction (if you're using SQL Server or something else that supports them as your entity source). Start the snapshot tx on the first request, then tie the life of the tx to the session, so that you're looking at a stable picture of the data between page requests- the tx will be released when the session is closed (or times out, if the client disconnected unexpectedly). Then the client requests the last key value it saw + how many records it wants (careful of maxReceivedMessageSize- leave LOTS of headroom). Since you're in a snapshot, you don't have to worry about changes- you'll see a consistent view for the duration of the dump. If you can't snapshot your source data to prevent it from changing mid-download, life is a lot harder. Always doable, but designing for that is very specific to the data.

Thanks for your input. I will look into the snapshot transaction idea. As it looks now, it may be that we for some of the services move into Linq-to-sql for supporting paging. I will check if those ideas can be combined, which would be ideal.

Fredrik Mörk 2009-09-30 07:28:58

They can- we use LINQ to SQL for everything. The only trick with the snapshot is that you'd need to touch all the records (and related data) once at the beginning, (but don't actually return it to the client) to get it included in the snapshot. A discrete SQL command would probably be better for that.

nitzmahone 2009-09-30 16:58:55

ansaurus

tags:

views:

answers:

Best practice for WCF service with large amounts of data?

related questions