views:

302

answers:

5

Hey guys,

I have an importer process which is running as a windows service (debug mode as an application) and it processes various xml documents and csv's and imports into an SQL database. All has been well until I have have had to process a large amount of data (120k rows) from another table (as I do the xml documents).

I am now finding that the SQL server's memory usage is hitting a point where it just hangs. My application never receives a time out from the server and everything just goes STOP.

I am still able to make calls to the database server separately but that application thread is just stuck with no obvious thread in SQL Activity Monitor and no activity in Profiler.

Any ideas on where to begin solving this problem would be greatly appreciated as we have been struggling with it for over a week now.

The basic architecture is c# 2.0 using NHibernate as an ORM data is being pulled into the actual c# logic and processed then spat back into the same database along with logs into other tables.


The only other prob which sometimes happens instead is that for some reason a cursor is being opening on this massive table, which I can only assume is being generated from ADO.net the statement like exec sp_cursorfetch 180153005,16,113602,100 is being called thousands of times according to Profiler

A: 

Are you running this into SQL using BCP? If not, the transaction logs may not be able to keep up with your input. On a test machine, try turning the recovery mode to Simple (non-logged) , or use the BCP methods to get data in (they bypass T logging)

StingyJack
We are not using BCP as the validation of the data is too complex and I need to do some lookups etc on it. I have turned my test rig to Simple mode but it doesnt appear to have improved things
tigermain
A: 

Adding on to StingyJack's answer ...

If you're unable to use straight BCP due to processing requirements, have you considered performing the import against a separate SQL Server (separate box), using your tool, then running BCP?

The key to making this work would be keeping the staging machine clean -- that is, no data except the current working set. This should keep the RAM usage down enough to make the imports work, as you're not hitting tables with -- I presume -- millions of records. The end result would be a single view or table in this second database that could be easily BCP'ed over to the real one when all the processing is complete.

The downside is, of course, having another box ... And a much more complicated architecture. And it's all dependent on your schema, and whether or not that sort of thing could be supported easily ...

I've had to do this with some extremely large and complex imports of my own, and it's worked well in the past. Expensive, but effective.

John Rudy
Unforunately I am running the service on a seperate box at the moment and sadly the actual processing I need to do is what is causing the problems so BCP really isn't an answer whatever the case
tigermain
So you're not maxing out the RAM on the SQL Server, it's on the box where the process is? Or is it maxing on a secondary import-only SQL Server already? (Sorry, just woke up, groggy and want to help any way I can ... but need more info.)
John Rudy
+1  A: 

When are you COMMITting the data? Are there any locks or deadlocks (sp_who)? If 120,000 rows is considered large, how much RAM is SQL Server using? When the application hangs, is there anything about the point where it hangs (is it an INSERT, a lookup SELECT, or what?)?

It seems to me that that commit size is way too small. Usually in SSIS ETL tasks, I will use a batch size of 100,000 for narrow rows with sources over 1,000,000 in cardinality, but I never go below 10,000 even for very wide rows.

I would not use an ORM for large ETL, unless the transformations are extremely complex with a lot of business rules. Even still, with a large number of relatively simple business transforms, I would consider loading the data into simple staging tables and using T-SQL to do all the inserts, lookups etc.

Cade Roux
In other words, are you trying to load all the records in a single TRANSACTION?
le dorfier
I am performing a COMMIT at the end of each batch of 10 in nHibernate so its flushing the connection then. he test box I am running it on only has 1gb of ram in it, but the server it will be running on is a quad xeon with 8gb, but Im guessing there is something more significant as the problem
tigermain
A: 

I found out that it was nHibernate creating the cursor on the large table. I am yet to understand why, but in the mean time I have replaced the large table data access model with straight forward ado.net calls

tigermain
A: 

Since you are rewriting it anyway, you may not be aware that you can call BCP directly from .NET via the System.Data.SqlClient.SqlBulkCopy class. See this article for some interesting perforance info.

RedFilter