tags:

views:

3412

answers:

14

I can understand wanting to avoid having to use a cursor due to the overhead and inconvenience, but it looks like there's some serious cursor-phobia-mania going on where people are going to great lengths to avoid having to use one

for example, one question asked how to do something obviously trivial with a cursor and the accepted answer proposed using a common table expression (CTE) recursive query with a recursive custom function, even though this limits the number of rows that could be processed to 32 (due to recursive call limit in sql server). This strikes me as a terrible solution for system longevity, not to mention a tremendous effort just to avoid using a simple cursor.

what is the reason for this level of insane hatred? has some 'noted authority' issued a fatwa against cursors? does some unspeakable evil lurk in the heart of cursors that corrupts the morals of the children or something?

wiki question, more interested in the answer than the rep

thanks in advance!

Related Info:

http://stackoverflow.com/questions/37029/sql-server-fast-forward-cursors

EDIT: let me be more precise: I understand that cursors should not be used instead of normal relational operations, that is a no-brainer. What I don't understand is people going waaaaay out of their way to avoid cursors like they have cooties or something, even when a cursor is a simpler and/or more efficient solution. It's the irrational hatred that baffles me, not the obvious technical efficiencies.

+5  A: 

In general, because on a relational database, the performance of code using cursors is an order of magnitude worse than set-based operations.

Charles Bretana
do you have a benchmark or reference for this? i have not noticed any such drastic performance degradation... but maybe my tables don't have enough rows for it to matter (a million or less, usually)?
Steven A. Lowe
oh wait i see what you mean - but i would never advocate using cursors intead of set operations, only not going to extremes to avoid cursors
Steven A. Lowe
I remember the first time I did SQL, We had to import a 50k daily data file from a mainframe into a SQL Server database... I used a cursor, and discovered, that the import was taking about 26 hours using the cursor... When I changed to set-based operations, the process took 20 minutes.
Charles Bretana
+21  A: 

Cursors make people overly apply a procedural mindset to a set-based environment.

And they are SLOW!!!

From SQLTeam:

Please note that cursors are the SLOWEST way to access data inside SQL Server. The should only be used when you truly need to access one row at a time. The only reason I can think of for that is to call a stored procedure on each row. In the Cursor Performance article I discovered that cursors are over thirty times slower than set based alternatives.

EDIT: @Steven, see more up-to-date article from SQL Server Performance: Performance Tuning SQL Server Cursors for more info.

Galwegian
that article is 7 years old, do you think that perhaps things might have changed in the meantime?
Steven A. Lowe
I also think cursors are really slow and to be avoided, generally. However, if the OP was referring to the question I think he was, then a cursor was the correct solution there (streaming records one at a time due to memory constraints).
rmeador
the updated article does not correct the relative speed measurements, but it does provide some good optimizations and alternatives. Note that the original article says that cursors are 50 times faster than while loops, which is interesting
Steven A. Lowe
I personally think that if you need a cursor you haven't designed your database properly in the first place.
BoltBait
@BoltBait: I personally think that if you make blanket assertions like that you can't really be 45 years old :-P
Steven A. Lowe
Cursors are not that slow in Oracle, where you can use clauses like BULK-COLLECT for performance-wise code. Would be nice to do a benchmark though.
Camilo Díaz
@Steven: Yeah, I'm old... and very opinionated!
BoltBait
@BoltBait: You kids get off my lawn!
Steven A. Lowe
+5  A: 

Cursors tend to be used by beginning SQL developers in places where set-based operations would be better. Particularly when people learn SQL after learning a traditional programming language, the "iterate over these records" mentality tends to lead people to use cursors inappropriately.

Most serious SQL books include a chapter enjoining the use of cursors; well-written ones make it clear that cursors have their place but shouldn't be used for set-based operations.

There are obviously situations where cursors are the correct choice, or at least A correct choice.

davidcl
+5  A: 

The optimizer often cannot use the relational algebra to transform the problem when a cursor method is used. Often a cursor is a great way to solve a problem, but SQL is a declarative language, and there is a lot of information in the database, from constraints, to statistics and indexes which mean that the optimizer has a lot of options to solve the problem, whereas a cursor pretty much explicitly directs the solution.

Cade Roux
+1  A: 

Can you post that cursor example or link to the question? There's probably an even better way than a recursive CTE.

In addition to other comments, cursors when used improperly (which is often) cause unnecessary page/row locks.

Gordon Bell
there is a better way - a freakin' cursor ;-)
Steven A. Lowe
+5  A: 

I don't have enough reputation to comment yet, but there's an answer above which says "cursors are the SLOWEST way to access data inside SQL Server... cursors are over thirty times slower than set based alternatives."

This statement may be true under certain circumstances, but as a blanket statement it's false. For example, we've made good use of cursors in situations where we want to perform a complex update operation on a large table which is receiving constant production reads. Running a stored procedure which does these updates one row at a time ends up being faster than set-based operations, because the set-based operation conflicts with the read operation and ends up causing horrific locking problems.

In the absence of other database activity, set-based operations are universally faster. In production systems, it depends.

davidcl
Sounds like the exception that proves the rule.
Joel Coehoorn
@[Joel Coehoorn]: I've never understood that saying.
Steven A. Lowe
@[Steven A. Lowe] http://www.phrases.org.uk/meanings/exception-that-proves-the-rule.html understand exception as "what is left out" and note that the rule here is something like "in most situation cursors are bad".
David Lay
@delm: thanks for the link, now i understand the phrase even less!
Steven A. Lowe
@[Steven A. Lowe] Basically it's saying that if you "break a rule" with a subcase, there must be a general rule to break, ergo a rule exists. e.g. From Link: ("If we have a statement like 'entry is free of charge on Sundays', we can reasonably assume that, as a general rule, entry is charged for.")
Fry
@Fry: ok that makes sense - so how does it apply here?
Steven A. Lowe
I think hes saying that when there are locking problems / memory constraints use cursors. This implies that you shouldn't use cursors otherwise.
SapphireSun
+13  A: 

The "overhead" with cursors is merely part of the API. Cursors are how parts of the RDBMS work under the hood. Often CREATE TABLE and INSERT have SELECT statements, and the implementation is the obvious internal cursor implementation.

Using higher-level "set-based operators" bundles the cursor results into a single result set, meaning less API back-and-forth.

Cursors predate modern languages that provide first-class collections. Old C, COBOL, Fortran, etc., had to process rows one at a time because there was no notion of "collection" that could be used widely. Java, C#, Python, etc., have first-class list structures to contain result sets.

The Slow Issue

In some circles, the relational joins are a mystery, and folks will write nested cursors rather than a simple join. I've seen truly epic nested loop operations written out as lots and lots of cursors. Defeating an RDBMS optimization. And running really slowly.

Simple SQL rewrites to replace nested cursor loops with joins and a single, flat cursor loop can make programs run in 100th the time. [They thought I was the god of optimization. All I did was replace nested loops with joins. Still used cursors.]

This confusion often leads to an indictment of cursors. However, it isn't the cursor, it's the misuse of the cursor that's the problem.

The Size Issue

For really epic result sets (i.e., dumping a table to a file), cursors are essential. The set-based operations can't materialize really large result sets as a single collection in memory.

Alternatives

I try to use an ORM layer as much as possible. But that has two purposes. First, the cursors are managed by the ORM component. Second, the SQL is separated from the application into a configuration file. It's not that the cursors are bad. It's that coding all those opens, closes and fetches is not value-add programming.

S.Lott
"Cursors are how the RDBMS works under the hood." If you mean specifically SQL Server, OK, fine, I'm ignorant of that. But I have worked on the internals of multiple RDBMS (and ORDBMS) (under Stonebraker) and none of them did that. Eg: Ingres uses what amounts to "result sets" of tuples internally.
Richard T
@Richard T: I am working off second-hand information about RDBMS source; I'll amend the statement.
S.Lott
"I've seen truly epic nested loop operations written out as lots and lots of cursors." I keep seeing them too. It is hard to believe.
RussellH
A: 

basicaly 2 blocks of code that do the same thing. maybe it's a bit weird example but it proves the point. SQL Server 2005:

SELECT * INTO #temp FROM master..spt_values
DECLARE @startTime DATETIME

BEGIN TRAN 

SELECT @startTime = GETDATE()
UPDATE #temp
SET number = 0
select DATEDIFF(ms, @startTime, GETDATE())

ROLLBACK 

BEGIN TRAN 
DECLARE @name VARCHAR

DECLARE tempCursor CURSOR
    FOR SELECT name FROM #temp

OPEN tempCursor

FETCH NEXT FROM tempCursor 
INTO @name

SELECT @startTime = GETDATE()
WHILE @@FETCH_STATUS = 0
BEGIN

    UPDATE #temp SET number = 0 WHERE NAME = @name
    FETCH NEXT FROM tempCursor 
    INTO @name

END 
select DATEDIFF(ms, @startTime, GETDATE())
CLOSE tempCursor
DEALLOCATE tempCursor

ROLLBACK 
DROP TABLE #temp

the single update takes 156 ms while the cursor takes 2016 ms.

Mladen Prajdic
well yes, it proves the point that this is a really dumb way to use a cursor! but what if the update of each row depended on the value of the prior row in date order?
Steven A. Lowe
BEGIN TRANSELECT TOP 1 baseval FROM tableORDER BY timestamp DESCINSERT table (fields)VALUES (vals, including derived value from prior record)COMMIT TRAN
le dorfier
@doofledorfer: that would insert one row based on the last row by date, not update every row by a value from its prior row in date order
Steven A. Lowe
To truly use the cursor you should use WHERE CURRENT OF in the update
erikkallen
+2  A: 

You could have probably concluded your question after the second paragraph, rather than calling people "insane" simply because they have a different viewpoint than you do and otherwise trying to mock professionals who may have a very good reason for feeling the way that they do.

As to your question, while there are certainly situations where a cursor may be called for, in my experience developers decide that a cursor "must" be used FAR more often than is actually the case. The chance of someone erring on the side of too much use of cursors vs. not using them when they should is MUCH higher in my opinion.

Tom H.
please read more carefully, Tom - the exact phrase was "insane hatred"; "hated" was the object of the adjective "insane", not "people". English can be a bit difficult sometimes ;-)
Steven A. Lowe
+2  A: 

For what it's worth I have read that the "one" place a cursor will out perform its set-based counterpart is in a running total. Over a small table the speed of summing up the rows over the order by columns favors the set-based operation but as the table increases in row size the cursor will become faster because it can simply carry the running total value to the next pass of the loop. Now where you should do a running total is a different argument...

esabine
If you mean by "running total" an aggregation of some kind (min, max, sum), any competent DBMS will beat the pants off of a client-side, cursor based solution, if only because the function is performed in the engine and there's no client <--> server overhead. Maybe SQL Server isn't competent?
Richard T
@[Richard T]: we're discussing server-side cursors, as within a stored procedure, not client-side cursors; sorry for the confusion!
Steven A. Lowe
+4  A: 

The answers above have not emphasized enough the importance of locking. I'm not a big fan of cursors because they often result in table level locks.

Richard T
yes, thank you! Without options to prevent it (read only, forward only, etc) they certainly will, as will any (sql server) operation that proceeds to occupy several rows and then several pages of rows.
Steven A. Lowe
+3  A: 

In Oracle PL/SQL cursors will not result in table locks and it is possible to use bulk-collecting/bulk-fetching.

In Oracle 10 the often used implicit cursor

  for x in (select ....) loop
    --do something 
  end loop;

fetches implicitly 100 rows at a time. Explicit bulk-collecting/bulk-fetching is also possible.

However PL/SQL cursors are something of a last resort, use them when you are unable to solve a problem with set-based SQL.

Another reason is parallelization, it is easier for the database to parallelize big set-based statements than row-by-row imperative code. It is the same reason why functional programming becomes more and more popular (Haskell, F#, Lisp, C# LINQ, MapReduce ...), functional programming makes parallelization easier. The number CPUs per computer is rising so parallelization becomes more and more an issue.

tuinstoel
+1  A: 

I agree with article on this page:

http://weblogs.sqlteam.com/jeffs/archive/2008/06/05/sql-server-cursor-removal.aspx

Edin
good article; the author should also have mentioned that while-loops are far less efficient than cursors
Steven A. Lowe
+1  A: 

Outside of the performance (non)issues, I think the biggest failing of cursors is they are painful to debug. Especially compared to code in most client applications where debugging tends to be comparatively easy and language features tend to be much easier. In fact, I contend that nearly anything one is doing in SQL with a cursor should probably be happening in the client app in the first place.

Wyatt Barnett
SQL is painful to debug, even without cursors. MS SQL step-through tools in Visual Studio don't seem to like me (they hang a lot, or don't trip breakpoints at all), so I'm usually reduced to PRINT statements ;-)
Steven A. Lowe