views:

2639

answers:

2

Hi I'm working a project where we need to process several xml files once a day and populate a Database with the information contained in those files.

Each file is roughly 1Mb and contains about 1000 records; we usually need to process between 12 and 25 of these files. I've seen some information regarding bulk inserts using NHibernate but our problem is somehow trickier since the xml files contain new records mixed with updated records.

In the xml there is a flag that tells us is a specific record is a new one or an update to an existing record, but not what information has changed. The xml records do not contain our DB identifier, but we can use an identifier from the xml record to uniquely locate a record in our DB.

Our strategy so far has been to identify if the current record is an insert or an update and based on that we either perform an insert on the DB or we do a search, then we update the information on the object with the information coming from the xml record and finally we do an update on the DB.

The problem with our current approach is that we are having issues with DB locks and our performance degrades really fast. We have thought about some alternatives like having separate tables for the distinct operations or even separate DB’s but doing such a move would mean a big effort so before any decisions I would like to ask for the community opinion on this matter, thanks in advance.

+9  A: 

A couple of ideas:

  • Always try to use IStatelessSession for bulk operations.
  • If you're still not happy with the performance, just skip NHibernate and use a stored procedure or parameterized query specific to this, or use IQuery.ExecuteUpdate()
  • If you're using SQL Server, you could convert your xml format to BCPFORMAT xml then run BULK INSERT on it (only for insertions)
  • If you're having too many DB locks, try grouping the operations (i.e. first find out what needs to be inserted and what updated, then get PKs for the updates, then run BULK INSERT for insertions, then run updates)
  • If parsing the source files is a performance issue (i.e. it maxes out a CPU core), try doing it in parallel (you could use Parallel Extensions)
Mauricio Scheffer