views:

130

answers:

2

Hi all,

being new to c# I've run into this 'conundrum' when passing around a SqlDataReader between different threads. Without going into too much detail, the idea is to have a main thread fetching data from the database (a large recordset) and then have a helper-task run through this record by record and doing some stuff based upon the contents of this. There is no feedback to the recordset, it simply wades through until no records are left. This works fine, but given the nature of the job at hand it should be possible to have this job spread over different threads (CPUs) to maximize throughput (the order of execution is of no significance). The question then becomes, when I pass this recordset in a SqlDataReader, do I have to use ref or not ? It kind of boils down to the question : if I pass the object around without specifying ref, won't it create new copies in memory and have records processed n times ? Or, don't I risk having the record-position being moved forward while not all fields have been fully read yet ? The latter seems more like a 'data racing' issue and probably is covered by the lock()ing mechanism (or not?).

My initial take on the problem was that it doesn't really hurt passing the variable using ref, yet as a colleague put it : "you only need ref when you're doing something wrong" =) Additionally using ref restricts me from applying a Using() construction too which isn't very nice either. I thus create a "basic" project that tackles the same approach but without the ref notation. Tests so far show that it works flawlessly on a Core2Duo (2cpu) using any number of threads, yet I'm still a bit wary... What do you experts think about this ? Use ref or not ?

You can find the test-project here as it seems I can't upload it to this question directly ?!?

ps: it's just a test-project and I'm new to c#, so please be gentle on me when breaking down the code =P

+5  A: 

Don't pass SqlDataReader around to multiple threads. The documentation says this class is not thread safe.

Even if it were, you would have all sorts of timing issues arising from this, especially when it comes time to dispose of your SqlDataReader. Add to that the fact that SqlDataReader already streams data in on another thread, and I think you're prematurely optimizing.

Dave Markle
+1 definitely premature optimisation.
Mitch Wheat
Looking at the OP's code this isn't even an optimisation: They're using a lock to synchronise access to the reader which just means that the threads are queuing up waiting for access. Add to that the overhead of managing multiple threads and this will almost certainly cause *decreased* performance (not to mention fragile, less readable, less maintainable code).
LukeH
deroby
@deroby: If you're synchronising access to the reader by using a lock then *you're getting no benefit* from accessing that reader on multiple threads. If you're not synchronising access to the reader then you're running the risk of race conditions, or worse. Why not access the reader on your main thread and just farm the data-shifting etc out to different worker threads?
LukeH
Mainly because it /might/ be required to read more than 1 record at a time. The original multi-threaded approach had something along the lines of a main thread running through the records, copying the data in DataTables and starting x new threads with given data-tables.Then it waits until a thread has finished and starts it anew with a fresh DataTable. This worked decently too, yet it requires to actively keep checking for 'finished' threads which didn't feel optimal as said loop consumed quite a bit of CPU too, and adding Sleep(100) mitigated that but also slowed starting a new blocks down.
deroby
@deroby: You *cannot* read more than one record at a time from a `SqlDataReader`. As I said, if you lock then you'll always be reading one record at a time; if you don't lock then you'll hit race conditions, or worse.
LukeH
I agree that it's impossible to read more than one record at a time from a SqlDataReader (simply because there is only THE currrent record available). The reason to pass it around isn't that much because I want to spread out the reading from said object but rather the processing that happens on the data found in there.The logical answer then indeed seems to be "If that's the case, then maybe you should *not* spread the reading around, but *only* the processing". I agree, but sadly I have not come up with an efficient way for that either yet... but I most certainly will revisit that approach.
deroby
+1  A: 

Thanks for all the comments, I certainly didn't expect so many reactions in such short time ! (don't you guys have anything better to do ? =) Sadly most of them point out my approach is flawed =/

Anyway, assuming there is at least some learning-merit to my current approach I still was hoping to find an answer to the 'use ref or not' question. Having waded through plenty of Q&A, forums & blogs for pretty much all day now I finally stumbled upon this one (AFTER I posted my question off course, typical) here . From what is said in that answer, I think it's safe to conclude there is no need to use ref because although it passes 'by value', I now understand it actually passes the pointer on the stack hence any changes made to the object (e.g.. moving to the next record) are done on the same instance of the object which is what I needed reassurance off.

As long as I don't do something silly like believing that another thread can replace the object with eg. the result of another SqlCommand.ExecuteReader(), things should be OK.

Rereading the comments above I now also better understand what Mitch Wheat meant with his remark

the use of ref (essentially a reference to a reference)

sorry I didn't "get" it earlier...

deroby
Here's another good overview of parameter passing etc: http://www.yoda.arachsys.com/csharp/parameters.html
LukeH