views:

150

answers:

7

Given a datatable containing two columns like this:

Private Function CreateDataTable() As DataTable
    Dim customerTable As New DataTable("Customers")
    customerTable.Columns.Add(New DataColumn("Id", GetType(System.Int32)))
    customerTable.Columns.Add(New DataColumn("Name", GetType(System.String)))

    Dim row1 = customerTable.NewRow()
    row1.Item("Id") = 1
    row1.Item("Name") = "Customer 1"
    customerTable.Rows.Add(row1)

    Dim row2 = customerTable.NewRow()
    row2.Item("Id") = 2
    row2.Item("Name") = "Customer 2"
    customerTable.Rows.Add(row2)

    Dim row3 = customerTable.NewRow()
    row3.Item("Id") = 3
    row3.Item("Name") = "Customer 3"
    customerTable.Rows.Add(row3)

    Return customerTable
End Function

Would you use this snippet to retrieve a List(Of Integer) containing all Id's:

Dim table = CreateDataTable()

Dim list1 As New List(Of Integer)

For i As Integer = 0 To table.Rows.Count - 1
    list1.Add(CType(table.Rows(i)("Id"), Integer))
Next

Or rather this one:

Dim list2 = (From r In table.AsEnumerable _
             Select r.Field(Of Integer)("Id")).ToList()

This is not a question about whether to type cast the Id column to Integer by using .Field(Of Integer), CType, CInt, DirectCast or whatever but generally about whether or not you choose Linq over forloops as the subject implies.


For those who are interested: I ran some iterations with both versions which resulted in the following performance graph:

graph

The vertical axis shows the milliseconds it took the code to convert the rows' ids into a generic list with the number of rows shown on the horizontal axis. The blue line resulted from the imperative approach (forloop), the red line from the declarative code (linq).


Whatever way you generally choose: Why do you go that way and not the other?

+5  A: 

Whenever possible I favor the declarative way of programming instead of imperative. When you use a declarative approach the CLR can optimize the code based on the characteristics of the machine. For example if it has multiple cores it could parallelize the execution while if you use an imperative for loop you are basically locking this possibility. Today maybe there's no big difference but I think that in the future more and more extensions like PLINQ will appear allowing better optimization.

Darin Dimitrov
Is that really so? I remembered a blog article I read some time ago. You can find it here: http://ox.no/posts/linq-vs-loop-a-performance-test. The basic conclusion goes like this: "Naturally, this will not perform as good as a traditional imperative loop, and less optimization is possible". This result comes from a runtime comparison that resulted in a linq query performing 50 times worse than the imperative approach.
Mephisztoe
I find the argument of performance not a very good one and I don't think you will ever have to expect the performance of a LINQ query to drastically improve without any code changes, because of the way the system has to be backwards compatible. However, I favor the declarative way of programming because of readability and maintainability. LINQ queries are much better in expressing intent.
Steven
I second this. I am also of the opinion that the declarative linq query is much more readable and thus maintainable. However, Darin brought in the argument of code optimization and I was wondering if the compiler is really able to better optimize code based on a declarative syntax than imperative code.
Mephisztoe
@Darin: Have you ever opened up a Linq query in Reflector? LINQ is not actually emitted in the resulting binary for Linq to Objects.
Billy ONeal
@Billy, yes I have opened a LINQ query in Reflector. It is based on extension methods being called so I don't understand your argument of LINQ not being emitted in the resulting binary.
Darin Dimitrov
As far as performance is concerned, I think that currently LINQ query probably won't perform better than a `for` loop but a moment will come in the future where extension methods will improve and will be able to optimize this query. In my opinion as a programmer it is more important to focus on the WHAT (declarative) rather than the HOW (imperative). Leave the HOW to the gurus that know the internals of the CLR, microprocessors, multithreading, memory and assembly code. Don't try to be smart as you cannot be smart for all the possible machines that your program might run under.
Darin Dimitrov
@Darin: Extension methods are just methods. There's nothing special about them as far as the CLR is concerned. They are syntactic sugar. Remember: .NET 3.5 still runs on 2.x CLR, which existed long before LINQ.
Billy ONeal
@Billy, I happen to know what extension methods are :-) What is special about them is that the ones that come in the BCL are written by experts, that was my point and this is what is special about them in contrast to everyday loops we are writing in our applications.
Darin Dimitrov
A: 

I recently found myself wondering whether I've been totally spoiled by LINQ. Yes, I now use it all the time to pick all sort of things out from all sort of collections.

GSerg
+1  A: 

I avoid linq unless it helps readability a lot, because it completely destroys edit-and-continue.

When they fix that, I will probably start using it more, because I do like the syntax a lot for some things.

Ch00k
You make a fair statement about using the debugger with LINQ. However, I like to turn it around: I use LINQ unless I have to debug that code. In that situation I (temporarily) rewrite that statement. Normally however, unit tests save my day.
Steven
A: 

I started to, but found out in some cases, I saved time by using this approach:

for (var i = 0, len = list.Count; i < len; i++) { .. }

Not necessarily in all cases, but some. Most extension methods use the foreach approach of querying.

Brian
+1  A: 

For almost everything I've done I've come to the conclusion that LINQ is optimized enough. If I handcrafted a for loop it would have better performance, but in the grand scheme of things we are usually talking milliseconds. Since I rarely have a situation where those milliseconds will make any kind of impact, I find it's much more important to have readable code with clear intentions. I would much rather have a call that is 50ms slower than have someone come along and break it altogether!

Mike M.
+1  A: 

Resharper has a cool feature that will flag and convert loops into Linq expressions. I will flip it to the Linq version and see if that hurts or helps readability. If the Linq expression more clearly communicates the intent of the code, I will go with that. If the Linq expression is unreadable, I will flip back to the foreach version.

Most of the performance issues don't really compare with readability for me.

Clarity trumps cleverness.

In the above example, I would go with the the Linq version since it clearly explains the intent and also locks out people accidently adding side effects in the loop.

Kenoyer130
That's exactly what I do as well. However, since we already have a "million" places that use the loop and that's what everyone is used to, I leave it the way it is.
Greg
A: 

I try to follow these rules:

  • Whenever I'm just querying (filtering, projecting, ...) collections, use LINQ.
  • As soon as I'm actually 'doing' something with the result (i.e, introduce side effects), I'll use a for loop.

So in this example, I'll use LINQ.

Also, I always try to split up the 'query definition' from the 'query evaluation':

Dim query = From r In table.AsEnumerable() 
            Select r.Field(Of Integer)("Id")

Dim result = query.ToList()

This makes it clear when that (in this case in-memory) query will be evaluated.

jeroenh