views:

68

answers:

2

In SQL, one should always strive for set-based operations versus iteration-based (i.e. looping). In .NET, we frequently loop collections and objects. Are there any commands in .NET that allow set-based processing or is everything iteration-based? (I'm reminded of how DataAdapter.Fill calls DataReader which iterates through each record in the result set). I'm not terribly familiar with LINQ, but my guess would be that its implementation merely masks the iterations happening behind the scenes.


UPDATE:

To clarify: I'm not claiming to be any sort of genius here and I'm not second guessing any of the brilliant people who make my life programming better. I am simply asking if there are commands that perform set-based operations, like SQL does when you UPDATE, versus foreach(var item in obj) { ... } which is clearly iterating through the object. SQL developers are chastised at every turn if ever they use a loop, yet in .NET, we use them all the time. Being a develper who works heavily in both SQL and .NET, I'm asking if there are any alternatives in .NET that avoid looping altogether.

A: 

LINQ-to-SQL queries are genuinely set-based operations. The only time iteration occurs is when you invoke GetEnumerator, either directly or via a foreach statement.

This works because LINQ-to-SQL starts with IQueryable<T> (where T is some class that acts as a surrogate for a table), and each LINQ operation you apply to an IQueryable<> returns another IQueryable<>. Behind the scenes, LINQ isn't accessing the database at all, but is instead building up a more and more complex LINQ expression. When you finally ask to enumerate the result, LINQ takes the entire expression and converts it into an SQL query that goes to database as a single request.

Marcelo Cantos
@Marcelo, so if you execute a select statement via LINQ-to-SQL it will build the resulting collection object without iterating? What about using LINQ to filter a `List<T>`?
Brad
@Brad: LINQ-to-SQL will issue a single query to the server, which will return a result set that the .Net code iterates over, so yes, there is iteration, but only once on the final result. Of course, you have to iterate over something eventually, since the consuming environment (.Net) is procedural in nature.
Marcelo Cantos
@Brad: When operating on a List<T>, you are using LINQ-to-objects, which uses iteration through and through.
Marcelo Cantos
@Marcelo, This is misleading. The thrust of your point is that the iterative processing is not happening in client-side code but on the server. There is no such thing as a "Genuine set-based operation" everything must at some point, be processed iteratively. Your point, that LINQ2SQL sends a constructed SQL stateement to the DB, however, is important, as the DB is much better at performing these set-based processing tasks (in it's variety of iterative approaches) than can be done quickly simply in code. (I did not know this, thx!)
Charles Bretana
@Charles: You are confusing semantics with implementation. SQL statements are genuinely set-based operations, in that they express the logic at the level of sets, with no consideration for what is happening at the algorithmic level. The engine could be a quantum computer that evaluates every tuple in parallel, or an analog computer that computes a sum by firing electron streams at a thermistor, which produces an answer in a single measurement. The point is that you don't care how the back-end does its thing, as long as your code isn't required to process tuples one-at-a-time.
Marcelo Cantos
@Marcelo, Parallel prcoessing, (which is real) is not the same thing as "Set-based" processing, or Set-Based Operations. The former can be accomplished by performing multiple operations concurrently, in a variety of ways, which you enumerated.
Charles Bretana
"Set-Based" operations, or processing, otoh, is just an abstraction representing a type of representation of an operation and has nothing to do with how it is implemented. It's equivilent to asking if the computer will "process" a linear algebra problem using "matrix" operations instead of iterative operations if you "represent" it as a matrix instead of as a system of linear equations. They are simply two different ways of represnting the exact same process in symbols.
Charles Bretana
Your statement that "SQL statements are genuinely set-based operations, in that they express the logic at the level of sets" betrays our miscommunication problem. To me, the word "genuinely" implies that you are describing what is physically happeneing, NOT simply how it is being "expressed". My argument is that SQL has nothing to do with what is physically happening, and is ONLY a different representation, or way to Express it. Your use of the word "genuine" made me assume (hopefully incorrectly) that you think this also reflects what is actually heppening.
Charles Bretana
@Charles: Suffice to say, I meant no such thing.
Marcelo Cantos
@Marcelo, In that case, my apologies! As is often the case, clarity should come first. Our business is such that precise use of language can avoid many misunderstandings and miscommunications. Might I respectfully suggest, however, that you reread your post "The only time iteration occurs is when you ..." and think about how your phrasing might have contributed to this misunderstanding...
Charles Bretana
+5  A: 

I'm not terribly familiar with LINQ, but my guess would be that its implementation merely masks the iterations happening behind the scenes.

How do you think SQL does it? It's not that iteration doesn't happen. It's a matter of how you express your intentions in code. Set-based and declarative operations tell the platform what you want, and then leave it up to the platform for figure out how best to do it. It works because the platforms that allow this kind of code are expert systems in their area, and so are much better at it than a human could hope to be. On the other hand, imperative or procedural code tells the platform exactly what to do and how to do it. This leaves less room for machine optimizations, usually requires more code, and is more prone to bugs.

Joel Coehoorn
@Brad, Joel is 100% correct. SQL Just expresses the intent is a set-based language structure. No computer (or person) can process things in a "Set-based" way. A "Set" is just a mathematical abstraction that accurately represents a collection of data. Mathematical "Set-based" operators like Joins, and cartesion products and Unions etc. are abstractions that represent a complex set of of individual operations in a consistent higher-level manner so that we can think about them without dealing with the complex details that they represent. They do not eliminate those details.
Charles Bretana
@Charles: While it is true that in conventional DBMSs, iteration over records and indexes is the norm, it is simply not true that there is no other way to process information. For an intriguing example of how data sets can be processed _en-masse_, with no per-element operations at all, check out the [one-pixel camera](http://dsp.rice.edu/cscamera).
Marcelo Cantos