Recommended framework for data aggregation

views:

210

answers:

+1 Q:

Recommended framework for data aggregation

We have an application that will be collecting data and storing it in local WinXP PCs using Microsoft SQL Server Compact. We want to aggregate that data up to a single full-blown SQL Server for reporting and archival. The data transport needs to be fairly continuous (i.e. not batched) though some latency is acceptable (a minute or two max).

Data is a one-way push from the collectors to the server. Collectors never need to know what other collectors are doing and the primary server will never be updating data back on the collector. Current plans are for 5 collectors, but it's essentially unbounded for scalability.

We have to assume we'll be "mostly connected" but we cannot guarantee the connection from the collectors to the server. If the server or network go down, we'll still be collecting and data will get pushed back up when the server is again reachable.

Ideally we'd like a solution that a non-programming engineer could set up once we've done the infrastructure work. So we're fine writing some code and wizards, but the end user cannot be assumed to know anything about writing code, though they will have reasonable technical computer literacy.

Right now we have two technology candidates for this:

SQL Replication
Microsoft Sync Services

We have little experience with the first, but we know that setting up subscriptions, etc in SQL Server is painful and debugging them is not fun, so we're trying to find an alternative.

We know almost nothing of #2, only that it's been suggested as an alternative for getting device data to a server.

Does anyone have experience in this type of a scenario or with either/both of these technologies or anything we're not thought of that they can share? SQL Compact on the collectors is a fixed requirement. SQL Server on the server is not required, but desired since the customer already has it.

Try the sync and tell us how it goes :) I saw a MSFT event and this guys says: "I added these 3 lines of code and everything just syncs up...wooohhooo".

Sounds like the way to go to me.

Sam 2008-11-19 20:51:22

The problem with replication is that when your schema changes, you're going to have manual work to do on each client to get replication up and running again. I don't have experience with Sync Services, but I would ask that same question: what happens when the schema changes? If you have to touch every client, that could be a problem.

Brent Ozar 2008-11-29 00:54:47

+1 A:

I have used Microsoft Sync Services before it was fully released. I liked it and it seems like a perfect fit for your application.

I recommend, if you want to make life easy for yourself, is to use a GUID (SQL Server uniqueidentifier) as primary keys on all tables you want to synchronize up to the main server. This will prevent collisions, and a lot of extra coding.

One caveat: I heard that Sync Services changed significantly after it was released for the first version, so my info is probably out of date.

Scott Whitlock 2008-11-29 01:01:25

I ended up going with option 3: neither. Instead we just periodically (user adjustable, but defaults to 5 seconds) use the SqlBulkCopy class to copy the records across. This works well becasue it allows us to pass in an IDataReader, so we locally open the table using TableDirect, seek to the highest RowID from the remote table, then pass the reader to the WriteToServer class.

ctacke 2008-12-31 15:55:10

ansaurus

tags:

views:

answers:

Recommended framework for data aggregation

related questions