I'm familiar with .NET and with SQL. Now I'm looking at the new LINQ and it looks to me just like a cursor. I understand the ease of use, etc., but if I do a LINQ-to-SQL query with a foreach loop, am I just using a DB cursor? Or is there some sort of magic behind the scenes where LINQ collects all the data at once and feeds it to my program one row at a time?
LINQ has many incarnations. Since you mention LINQ-to-SQL, I'll focus on that: it is neither... it is most comparable to IDataReader
- i.e. it executes a constructed TSQL command, and then returns the processed records row-by-row (see below) as it receives them from the SQL Server. As with IDataReader
, you only get one chance per row, unless you buffer the data using .ToArray()
, .ToList()
, etc.
The TSQL command executed is constructed from the LINQ query you build, and is parameterized to prevent SQL injection - so if you use "where", the TSQL executed includes "WHERE", etc.
In reality, the client is consuming records on demand via IEnumerable[<T>]
, so there needs to be a small buffer to handle the incoming TDS stream - but it isn't the same as a database cursor.
Note that most operations (Sort, etc) can be sent down to the server and translated into TSQL - but it is possible that some things have to be done at the client, which might again force it to buffer more data in memory. This is especially true if you force it via .ToEnumerable()
, which switches to LINQ-to-Objects. If LINQ-to-Objects does a buffering operation (sort, group, etc), then it will need to load the data into memory.
There are a few things of note here; for example, if you want to compute completely separate aggregates etc on a query, you must (normally) either execute it twice (etc), or buffer it all in memory (not necessarily an option for large data). For this type of scenario (and others), Jon Skeet and myself wrote an alternative LINQ implementation that works as a push rather than a pull, allowing you to feed data through multiple parallel processes at once. Jon explains it better here. This type of thing is mainly of use if you have a very large, once-only data stream that you want to do complex processing on. Interesting, though.
No it's not a cursor in the SQL sense. It doesn't iterate over the rows in the database one at a time. Rather, it dynamically constructs a query using the conditions that you specify. It delays the execution of the query until the first data is needed, then executes the query against the database. Depending on how you consume the data, the data may be buffered for you so that you can iterate over it row by row or it could return the entire rowset as a collection, if you use the extension methods to obtain a list, array, etc. As far as the database is concerned, however, it is simply executing a SQL statement.
Anytime you return rows from a database across the wire to a client, the database server uses a cursor to return the results of the query.
It's an implicit cursor, that's controlled by the database server, but it's still a cursor.
The reason people recommend against explicit cursors, and the reason they are slower than bulk queries, is because with an implicit cursor the database server knows exactly how it's being used. It can optimize access to the disk and can be smart with the locks that it uses to control concurrent access to the database. With an explicit cursor, the database sever has to assume the worse case, because it doesn't know when or how the locks held by the cursor will be released. Also, the database can use indexes to improve performance of bulk queries that you won't get if you a cursor. However, even in the simple, "return the whole table" case, you can get better performance using a bulk query rather than a cursor.
When you run a DLINQ query, the LINQ code gets converted into a bulk SQL query, which is submitted to the database using ADO.NET in a similar to fashion to what you would write if you were generate query strings and submitting them to the database.
This results in a cursor being created, but it's the same implicit cursor you would get if you were submitting bulk queries directly, and dealing with an IDataReader.