views:

483

answers:

6

Hi,

How scalable are datasets? Team member wants to use datasets for data retrieval and manipulation, use the built in data integrity etc. to use the object to perform data updates etc.

Our system is expected to scale to millions of users.

Everything that I have read is against dataset's in an enterprise environment, am I wrong here or what?

A: 

Aside from performance I wouldn't use them for a maintance standpoint. I prefer using POCO objects and an ORM.

Using datasets probally won't prevent you from scalling but there are faster alternatives. Such as reading right from a data reader to a POCO.

And also the only way your going to be able to answer this question is setup a test environment, and a test application that simulates what your going to be doing in the real app, and then bang the heck out of it.

Your environment should mimic your end state (If your going to have a farm with a dedicated sql box don't run your tests against a single server that is web and sql)

JoshBerke
Plain old C# Object, nothing special
JoshBerke
I don't know about that. ORM's tend to be far worse performance wise than specific queries that bring back only what is actually needed.
Jonathan Allen
I never said an ORM was the fastest, but you can custom the queries in most ORM...
JoshBerke
+6  A: 

Disclaimer- These are my opinions taken from person experience

Datasets are so painful to use I would REALLY not recommend using them unless you had some specific need for them. I have worked on large .NET 1.0 era projects (with thousands datasets) and I find them hard to maintain, use and test. You have to access everything with array based syntax unless you use strongly typed datasets, which you will spend forever maintaining.

I would really recommend using an ORM like NHibernate. You can learn more about NHibernate with these screen casts.

If you are interested in Saleable architecture you should look at the High Scalability web site, where you will be able to find the MySpace Architecture that you mention in your question.

For a more unbiased opinion on Datasets please check this MSDN link (summary below)

When to Use Which

Both DataSets and custom classes don't limit what you can do in any way, and both can be used to accomplish the same aims. That said, DataSets are fantastic tools for prototyping applications and represent excellent solutions for building systems in a kind of emergency—a limited budget, an approaching deadline, or a short application lifetime. For relatively simple applications, custom entities add a perhaps unnecessary level of complexity. In this case, I suggest that you seriously consider using DataSets.

In the economy of a large, durable, complex enterprise system that takes several months to complete, the cost of architecting and implementing a bunch of collections classes is relatively minimal and is incurred only once. The advantages in terms of performance, expressivity, readability, and ease of maintenance largely repay the investment. You are not bound to a tabular rendering of data. Business rules and custom business entities can't always be adapted to look like a collection of tables. In general, you should avoid adapting data to the data container—quite the reverse, I'd say. Finally, using custom classes makes for easier unit testing because classes and logic are more strictly related than with DataSets. In Figure 3, you find a synoptic table with DataSets, typed DataSets, and custom entities compared by several factors.

cgreeno
When you care about performance on this scale, ORM is not the way to go. You need much finer grained control over the SQL and the ability to tune it on the fly. That generally means stored procs.
Jonathan Allen
Most ORMs can be wired into stored procs. I agree that REALLY complex logic should be put into a stored proc, but that doesn't mean you shouldn't use an ORM. However, it does still mean you shouldn't use datasets unless you have a specific need for them.
cgreeno
Note that the "Custom Entities" model is missing the following features: Concurrency, relationships, serialization, data binding, expressions, etc. So, we're really comparing apples and oranges, IMO.
Mark Brackett
@Mark ya true I was just trying to link to an article that maybe gave a less biased opinion on datasets
cgreeno
A: 

Too many variables to answer performance apsect in any useful manner (for a start total users is a useless measure; peak requests per second would be a better start).

I would avoid Datasets unless you need their ability to manipulate data in memory repeatedly. If you need to pass through the data once, use a DataReader and avoid holding everything in memory.

(ORMs are another option of course.)

Richard
Why? DataSet's themselves don't have app-side performance issues. Also, the real concern is the SQL calls made to the database.
Jonathan Allen
The db overhead is consistent, but DataSet reads all the data into memory before being able to process it. A fire-hose cursor meanwhile can (1) avoid all the memory allocation, (2) process the first data as soon as it's available and (3) process data while more data is still being sent from the db.
Richard
+1  A: 

Yes, you are wrong about the enterprise portion of your question--they are acceptable in an enterprise environment. The issue is typically with developers' knowledge of the DataSet and the mistaken idea that you'll be able to write your own, more efficient, mechanism. That's all before you start recreating common functionality, like filtering for your object collections, Unit of Work mechanisms, etc.

That's a different question than scaling to millions of users. It's likely that you want to trim any of the fat, which requires you customize all your data logic. Going POCO probably is not the right direction. With POCO, you're still mapping non-db-aware structures to a database in a separate layer, adding extra logic that when scaled to a high level starts showing wear and tear on your performance.

You'll need to provide a more specific set of questions to get a better answer, but "enterprise" does not necessarily equal "millions of users". POCO, DataSets, etc lend themselves to quick development (regardless of cgreeno's unsupported opinion) as well as maintainability because of POCO's "simplification" of the model used in the app and the DataSet's wide adoption and understanding (among most developers). But to support millions of users, you're likely going to sacrifice maintainability for performance and scalability design elements. You just need to make the decision which "-abilities" are more important.

BTW, typed DataSets ARE DataSets. Saying typed DataSets are faster than non-typed is like saying I can run fast, but with this name tag on, I can run faster. Be careful to investigate unsupported claims about any particular tool and ask for evidence.

Mark A Johnson
While nothing you said is incorrect, I think you are going down the wrong path. The focus should be on the SQL being executed on the database, what happens inside the application probably won't be a concern.
Jonathan Allen
I was not saying they CAN'T be used, I am saying they shouldn't from a maintenance standpoint. I agree that you can develop with both datasets and POCO's equally quickly, however what you have at the end of the process is vastly different.
cgreeno
+2  A: 

DataSets are heavy. They offer a lot more than just in memory data. They have change tracking, views, relations, etc. If you use those features - then they are likely better than what you'll come with on your own.

Where folks get into trouble is when they use DataSets as a HashTable of sorts, and then complain that they're slower than a DataReader. Well, yeah - if you can get by with just a DataReader, then a DataSet is pure overkill - you're running 90% more code than you need.

So, the real question you have to ask yourself is - do I need a DataReader or a DataSet? If you need the DataSet's functionality, then you should probably wrap an abstraction around it and start there. You can optimize later if you need to (and no matter what you do, you will probably need to optimize once you perform some load testing).

Edit: I just want to point out that I'm talking scalibility concerns here - please don't read into this that I'm a fan of the DataSet's API design, the typed DataSet code gen, etc. - I'm not.

Mark Brackett
A: 

For reading data, DataSets are just fine. They should be only slightly slower than custom objects, though of course you need performance tests to verify this.

For writing data, you really want something more efficient. Dynamic SQL that updates only the columns that change or very-specific stored procedures will give you much better results.

Keep in mind that your database is probably going to be the bottle-neck, so make sure you profile each and every SQL call your application makes.

Jonathan Allen