views:

1113

answers:

10

Recent conversations with colleagues have produced varying points of view on this matter. What say you, SO members?

I know, even the concept of scalability can be taken in so many different ways and contexts, but that was part of the discussion when this came up. Everyone seemed to have a different take on what scalability really means. I'm curious to see the varying takes here as well. In fact, I posted a question just for that concept.

+3  A: 

It greatly depends on which LINQ provider you are using and how you are using it. LINQ probably is not know for amazing execution speed but is rather provides developers with substantially better productivity.

According to this link even with some of the CTPs Linq to SQL was already better than using direct SQL in some cases.

If you are concerned with Speed and are using LINQ to objects alot here is a codeplex project (I think) for a provider that can give you 1000x performance improvements.

smaclell
How in the world could LINQ to SQL be faster than direct SQL? It can't It's an apples and oranges comparison for the update statements.
Robert C. Barth
Not sure what you mean by apples and oranges for updates. That they are completely different? The average query can be partially optimized by the ORM layer, perform advanced caching and batching of queries for extra scalability. These would all need to be done by hand if you were to hand craft it.
smaclell
+9  A: 

I would guess that the best way to check is by writing benchmarks, but in my opinion LINQ has the possibility for optimizations that hand-writing similar code does not. I don't know how well it takes advantage of those yet.

LINQ lets you express what you want, not how to generate it. One obvious advantage is that LINQ is automatically parallelizable (see PLINQ).

Another advantage to LINQ is that it is lazy, so you can perform calculations, drawing from the collection as needed. You could hand-code an equivalent, but it may be much easier to get right in LINQ.

Lou Franco
and another dis-advantage to LINQ is that it is lazy, you don't want expensive trips to get data when you it could be done all at once. Its all swings n roundabouts.
gbjbaanb
+3  A: 

Your question about scalability in some ways depends on what you're using LINQ for. In business applications, you're not going to find a lot of SQL commands being executed--they're slow and have to be compiled in the DBMS. What you'll see instead are a lot of stored procedure calls. These will be slightly faster in LINQ.

Keep in mind that LINQ to SQL and the like are built on TOP of ADO.NET--they aren't a completely different methodology or anything. Sure, LINQ to XML will use different APIs under the covers. This will be much like a compiler--there are always some optimizations humans can make that might be faster, but for the most part, these APIs will be able to generate faster and less buggy code than code you write yourself.

In terms of scaling out, you can always put LINQ behind a web service if you want to distribute your data a bit or you can use SQL server replication. It shouldn't be any less scalable than ADO.NET would be.

Ed Altorfer
+7  A: 

This question's a little like asking "How scalable are collections?"

Let's just talk about LINQ to objects. Generally speaking, to the extent that most implementations of IEnumerable<T> iterate over every item in the underlying collection, LINQ has great potential to scale poorly. Create a List<Foo> that contains ten million items and something like this:

var list = from Foo f in fooList
           where f.Value = "Bar"
           select f;

is going to be slow. But that's really not LINQ's fault. You're the one that gave it a list of ten million items.

You deal with this the same way you'd deal with it if LINQ didn't exist: by building Dictionaries and SortedLists and the like that help you pare down the search space.

LINQ can improve scalability (well, make scalability easier to get to) via deferred query execution. You can replace a naive method which creates a list, filters it to a new list, filters that to a new list, etc. with a series of LINQ queries:

var list1 = from Foo f in fooList where f.Value1 = "Bar" select f;
var list2 = from Foo f in list1 where f.Value2 = "Baz" select f;
var list3 = from Foo f in list2 where f.Value3 = "Bat" select f;

all of which are executed over a single pass through the underlying collection when (and if) it becomes necessary to iterate over the final list. Again, though, this is nothing new: if you didn't have LINQ, you would probably end up replacing your naive method with one that did the same thing. But LINQ makes it a lot easier.

Robert Rossney
Are you saying you would improve performance in a 10 million item table by building Dictionaries and SortedLists of the data? Shouldn't you optimize the database (index's etc...) instead of the code in this instance?
Nathan Koop
Of course he's not saying that, Nathan -- he was making a point about linq vis a vis other ways of querying collections.
Danimal
I was talking about in-memory collections of objects, not rows in a database. Building dictionaries/sorted lists *is* adding indexes.
Robert Rossney
+1  A: 

Scalability and performance are two different but related things. If you want to measure performance, you need to see how many users (for example) you can support with one box. When you measure scalability you add another box and see if you can support double the original amount? Not likely, and you might only add 75% to your processing power, next one adds only 50% of the original unit, and so it goes down to zero pretty fast. No matter how many boxes you add at that rate, you are lucky to double your supported user count. That's scalability.

How your Linq module scales probably depends more on the database, how powerful the machine is, what is the design of the database, what is the design of your application.

You often see micro benchmarks that are supposed to reveal something conclusive, but they never do because they are just key hole view to the whole problem.

You can pull the good old 20/80 example out here. It's probably 20% about the tool and 80% about all kinds of tangibles that make up your application.

tvaananen
+8  A: 

In tests we did, LINQ to objects (ForEach) was about 2x times slower then foreach loop.

LINQ to SQL (MS SQL Database) is almost 10x slower than direct query using data reader, using most of the time creating SQL from expression tree (so, you'll be CPU bound and Database will be idling) To avoid this, you must use compiled queries.

See this for more. Most info in the post is still valid with .NET 3.5 SP1.

bh213
I'd be interested to see your tests rerun with precompiled LINQ queries, as they would be in any production environment.
JoshJordan
+5  A: 

In my opinion LINQ is meant to simplify things from a development stand point, not to address scalability.

In actual fact, using LINQ makes things so easy by hiding a lot of complications under the covers, and it could lead, when used irresponsibly, to scalability issues.

Examples abound in other answers, but to mention the most significant:

  • If you are querying an object collection you cannot disregard its size. Maybe doing it in the model, with LINQ, sounded good when there were a few objects to query... but as the size grows it becomes evident that the query should happen in the database, not in the model.

  • If you are autogenerating SQL with LINQ, as far as I know, you cannot give your database hints on how to compile queries, for example WITH (NOLOCK). As your table sizes grow, being able to address these issues is imperative.

  • Similar to the above, but maybe more general: when you are addressing scalability issues over a DB, you have to control what the DB is doing. Having a language that compiles to SQL, which is then compiled again to an execution plan, removes control from your hands.

  • What happens if you have to change your database schema in order to make it more scalable and your code is strongly tied to it because you have no stored procedures?

  • Although it seems simple, you cannot change LINQ provider without a lot of pain: querying SQL Server is not the same as querying object or as querying XML. The LINQ is very similar though. I do expect some of my junior developers to go on a "LINQ spree" because it's easier than learning how to do things with scalability in mind.

In conclusion, I think that it is possible to write scalable code with LINQ, but only by using it with good care. There are no killer tools, only killer code.

Sklivvz
No NOLOCK? Yikes! I suppose then also no SET NOCOUNT?
DOK
One should look first for answers before saying it's not possible : http://www.hanselman.com/blog/GettingLINQToSQLAndLINQToEntitiesToUseNOLOCK.aspx
sirrocco
@sirroco - The only way, among the three that Scott proposes, which lets you actually have "NOLOCK" is "write a stored procedure and put that in it". Wow. My point is valid.
Sklivvz
+1  A: 

If you are looking for a real life example, stackoverflow uses Linq heavily, check this post/podcast.

MMind
+1  A: 

There is a price for caching and loading objects on-demand using the Linq to SQL framework. If an object can lazy-load parts of itself on demand, it is very likely that there is a reference to the data context within each object. Incidentally that data context also caches every object ever requested from it. Which means that if you keep one of your objects around (either in a cache or just because you use it later), you are not only holding onto that object, but every object ever requested by the data context. These will never get garbage collected because they are still being referenced.

This is not a problem if all of your objectives have a short lifespan, and the application creates new DataContexts everytime it does new work. But I can see how it could create scalability issues, if someone was not aware of the additional encumbrance riding along with each object.

A: 

Linq is scalabile in many ways.

One aspect is Specification implementation behind linq, which allows Expression to be interpreted to run out of process, in a different language (Linq2Sql, Linq2Hibernate), or in a distributed computing invironment such as a map-reduce cluster for that matter (DryadLINQ)

Another aspect is semantics that linq provides to the language. You can iterate through billions of objects without filling the collection in memory if your provider supports deferred loading or you can paralellize or optimize the query (PLINQ or i4o).

George Polevoy