Data Collection from multiple workstations

views:

answers:

Data Collection from multiple workstations

Hi All,

I'm currently working on developing a windows service which will be deployed to all the workstation within the company. The purpose of the service is to collect various statistics and record them in a central location.

for the first iteration, the service is doing direct call to SQL to record the every time there is data to record. Testing it on about 120 workstation told me that it is not a good idea and won't scale.

Can someone recommend an alternative strategy for collecting the data in question which won't overload the database server? This will be deployed to about 2000 machines.

I'm working with C#.

Thanks.

+1 A:

From your question, it appears that you already have a working solution and you just need advice on how to make it scale. Here are my two cents:

Do not keep the SQL connections open in the workstation's service
Try batching data updates in memory and only update the database after a certain amount of data has been collected or a certain interval has passed (making your connection less chatty and more chunky).
Make sure your database is properly normalized to avoid duplicate data

Personally, for something mission-critical that requires it to scale to 2000 workstations, I would use a commercial product and would not re-invent the wheel.

Jose Basilio 2009-05-27 17:02:09

I ended up implementing it in two phases. see the details in my answer below. As far commercial products, I couldn't find any that would accomodate very special needs that I was asked to satisfy.thanks though.

nsr81 2009-10-06 20:08:38

Just answering to wrap up the question. Here is how I ended up implementing it.

Data Collection

A C# service runs on the workstations. At different intervals (not predetermined, based on certain events on the system) it fires off data collection.
Data is written to a file on the local workstation.
Data file is then copied to a network location.

Data Parsing

An "importer" program runs on a schedule. It goes through all the text files created since it last ran, parses them and imports them into SQL.

The reason I implemented it this way is to:

Avoid resource issues with my SQL server that I mentioned in the original question.
Files created are really small, 1-2KB
Importer runs a couple of minutes in the past to avoid colliding with any copies in progress.
Importer deletes the files which have been processed and skips the files which are not within it's "lastscantime" and "currentTime-Xminute" window.

nsr81 2009-10-06 20:07:15

ansaurus

tags:

views:

answers:

Data Collection from multiple workstations

related questions