views:

71

answers:

3

I have been given the task of re-writing some libraries written in C# so that there are no allocations once startup is completed.

I just got to one project that does some DB queries over an OdbcConnection every 30 seconds. I've always just used .ExecuteReader() which creates an OdbcDataReader. Is there any pattern (like the SocketAsyncEventArgs socket pattern) that lets you re-use your own OdbcDataReader? Or some other clever way to avoid allocations?

I haven't bothered to learn LINQ since all the dbs at work are Oracle based and the last I checked, there was no official Linq To Oracle provider. But if there's a way to do this in Linq, I could use one of the third-party ones.

Update:

I don't think I clearly specified the reasons for the no-alloc requirement. We have one critical thread running and it is very important that it not freeze. This is for a near realtime trading application, and we do see up to a 100 ms freeze for some Gen 2 collections. (I've also heard of games being written the same way in C#). There is one background thread that does some compliance checking and runs every 30 seconds. It does a db query right now. The query is quite slow (approx 500 ms to return with all the data), but that is okay because it doesn't interfere with the critical thread. Except if the worker thread is allocating memory, it will cause GCs which freeze all threads.

I've been told that all the libraries (including this one) cannot allocate memory after startup. Whether I agree with that or not, that's the requirement from the people who sign the checks :).

Now, clearly there are ways that I could get the data into this process without allocations. I could set up another process and connect it to this one using a socket. The new .NET 3.5 sockets were specifically optimized not to allocate at all, using the new SocketAsyncEventArgs pattern. (In fact, we are using them to connect to several systems and never see any GCs from them.) Then have a pre-allocated byte array that reads from the socket and go through the data, allocating no strings along the way. (I'm not familiar with other forms of IPC in .NET so I'm not sure if the memory mapped files and named pipes allocate or not).

But if there's a faster way to get this no-alloc query done without going through all that hassle, I'd prefer it.

+3  A: 

You cannot reuse IDataReader (or OdbcDataReader or SqlDataReader or any equivalent class). They are designed to be used with a single query only. These objects encapsulate a single record set, so once you've obtained and iterated it, it has no meaning anymore.

Creating a data reader is an incredibly cheap operation anyway, vanishingly small in contrast to the cost of actually executing the query. I cannot see a logical reason for this "no allocations" requirement.

I'd go so far as to say that it's very nearly impossible to rewrite a library so as to allocate no memory. Even something as simple as boxing an integer or using a string variable is going to allocate some memory. Even if it were somehow possible to reuse the reader (which it isn't, as I explained), it would still have to issue the query to the database again, which would require memory allocations in the form of preparing the query, sending it over the network, retrieving the results again, etc.

Avoiding memory allocations is simply not a practical goal. Better to perhaps avoid specific types of memory allocations if and when you determine that some specific operation is using up too much memory.

Aaronaught
Michael Covelli
In other words, the goal isn't no allocations. But no allocations during the continuous operations phase of the app (after startup is finished).
Michael Covelli
@Michael - but as you said, this query is *part* of your "continuous operation" and runs every 30 seconds. The extremely tight performance requirements (12,000 TPS!) of that app may have necessitated memory allocation restrictions, but as soon as you shove a database query in there, you blow that right out of the water, the query will cost way more than a GC pass.
Aaronaught
@Aaronaught You're right, if I tried to query the db on the same thread doing critical things, it would never meet the benchmark. But here, we have say one very critical thread and one worker thread in the background. The worker thread's small allocations will, over time, cause GCs that can freeze the critical thread because they freeze all threads. We are, in fact, seeing some freezes in the critical thread correlated with GCs. Is this one particular allocation for db querying causing all our issues? No, I don't think so. But I was told that after startup, my code should have no allocs
Michael Covelli
@Michael: Sounds like a heavy-handed overreaction to what's probably a small handful of inefficient classes or modules. I realize that it may not be your choice, that you've just been given an *instruction*, but perhaps if you profiled the application and were able to identify the *real* pain points, you would be able to make a convincing argument to drop this rather absurd requirement and simply fix/improve the inefficient code.
Aaronaught
@Michael: I second what Aaonaught says: it sounds like you really need to do some heap profiling to see what objects are making it into the last generation, who allocated them, and why they were retained.
SamB
+2  A: 

For such a requirement, are you sure that a high-level language like C# is your choice?
You cannot say whether the .NET library functions you are using are internally allocating memory or not. The standard doesn't guarantee that, so if they are not using allocations in the current version of .NET framework, they may start doing so later.

Vlad
Michael Covelli
@Michael, that was for a project that needed to process 12,000 transactions per second on a single instance. If you put even one database query in there, no matter how fast it runs, you're not going to make that benchmark. Database queries are **way** more expensive than memory allocations and GC passes.
Aaronaught
@Michael: Well, the document you referenced says that the approach included tight cooperation with MS specialists. From my side, I really doubt that the solution described in the article will remain allocation-free, for me it looks more like a hack. Well, you can try to avoid using of the library functions _at all_... But your question is about the database-related functionality, so I personally doubt that such rather complicated library functions won't allocate internally.
Vlad
@Aaronaught You're right, if I tried to query the db on the same thread doing critical things, it would never meet the benchmark. But here, we have say one very critical thread and one worker thread in the background. The worker thread's small allocations will, over time, cause GCs that can freeze the critical thread because they freeze all threads. We are, in fact, seeing some freezes in the critical thread correlated with GCs. Is this one particular allocation for db querying causing all our issues? No, I don't think so. But I was told that after startup, my code should have no allocs.
Michael Covelli
@Vlad I don't see why one needs cooperations with MS specialists. I agree that you might not know that a library function allocates from reading the MSDN spec. But you can always just test it and either watch GCs or look at a memory profiler (Ants, DotTrace, etc).
Michael Covelli
@Michael: only people from MS can guarantee that the library function is not going to allocate. Without such a guarantee, the only resort is to test and hope that your test covers all the possible code paths in the library functions (or reverse-engineer the said functions using some tool like Reflector). And of course the problem of having different .NET versions on your and your client's computer is still an issue.
Vlad
@Vlad You're right. Even billions of calls to a library function can never prove that we've hit all code paths. But I think that we can gain some probabilistic confidence that P(Alloc) is low for a library function with this type of testing. And I agree that different versions are also a potential issue. So let me say that I'm looking for a way to query a db with low P(Alloc) in .NET 3.5.
Michael Covelli
@Michael: how about preallocating some `OdbcDataReader`s? If you in advance how much time your thread will be running, you know the expected number of readers needed, too.
Vlad
@Michael the "no allocations period" requirement sounds pretty voodoo to me...
SamB
@Vlad That's pretty much what I've done in all the other libraries, I just have a resource pool <T> class and create everyting up front and then return objects to the pool when they're finished. But I don't see how that can work here. When I run ExecuteReader(), that system call itself creates an object and passes it back to me. The only idea that I have right now is to create a different process that does the queries and then use IPC to get the values back. So far as I know, if that process freezes for a GC, it won't affect my other process. Do you know it that's right?
Michael Covelli
@Michael: AFAIK the garbage collection in other processes doesn't affect your process. So indeed solution might be to have process #1 communicating with the DB and allocating more freely, and process #2 communicating with the client and not allocating at all. Maybe you'll need to explain to your clients that it's unavoidable to get rid of this single allocation, and so you are unloading it to a separate process. Of course, this means quite a lot of code refactoring for you, but this is what you are paid for. :)
Vlad
+1  A: 

I suggest you profile the application to determine where the time and/or memory are being spent. Don't guess - you will only guess wrong.

John Saunders
Isn't that, like, rules 3-7 of optimization? (The first two being: "Don't!" and "No, really, don't!")
SamB
@SamB: that's actually 4-8. 3 is "You really didn't want to do that, so go undo it before you make things worse".
John Saunders