views:

253

answers:

6

Application I work on generates several hundreds of files (csv) in a 15 minutes period of times. and the back end of the application takes these files and process them (updates database with those values). One problem is database locks.

What are the best practices on working with several thousands of files to avoid locking and efficiently processing these files?

Would it be more efficient to create a single file and process it? or process single file at a time?

What are some common best practices?

Edit: the database is not a relational dbms. It s nosql, object oriented dbms that works in the memory.

A: 

Lock will protect the files from processing until the first one is finished.

class ThreadSafe
{
  static readonly object _locker = new object();
  static int _val1, _val2;

  static void Go()
  {
    lock (_locker)
    {
      if (_val2 != 0) Console.WriteLine (_val1 / _val2);
      _val2 = 0;
    }
  }
}
mbcrump
Thanks, but i m not asking about code.
+1  A: 

With limited knowledge of your exact scenario...

Performance wise, closing the file is possibly the most expensive operation you would be performing in terms of time, so my advice would be if you can go the single file route - then that would be the most performant approach.

Mark Pearl
what would you do in this scheme?
+3  A: 

So, assuming that you have N-Machines creating files and each file is similar in the sense that it generally gets consumed into the same tables in the database...

I'd set up a Queue, have all of the machines write their files to the queue and then have something on the other side picking stuff off of the queue and then processing it into the database. So, one file at a time. You could probably even optimize out the file operations by writing to the Queue directly.

Jacob G
I already have such scheme, but i m afraid that there will be some contention.
@user177883: What type of contention? This scheme should mitigate any database locks. Is there a performance issue or somesuch?
Jacob G
what if the number of files outruns the ability to process them? assume you have a billion page views a day. and for each page view you will need to process some data. I imagine your answer would be add more servers to pick more from queue.
An alternative, then, would be to refactor your database or the generated files to avoid the locks so that you could do more concurrent processing. For example, if you have N actions to execute on a table, if x actions are inserts and y actions are updates, you could probably parallel process the inserts and then serially process the updates. But, if you're talking about billions of page views, then we really need more information about your files and your database to mitigate locking and ensure performance.
Jacob G
there are no tables in my database, object oriented, totally works in the memory.
I'm going to proclaim ignorance here... What product are you using? Cassandra or some such? I thought nosql solutions minimized things like locks and transactions.
Jacob G
+2  A: 

If you are experiencing problems with locks, it's likely the database tables being updated do not have proper indexes on them. Get the SQL code that does the updating and find out what the execution plan is for it; if you are using MSSQL, you can do this in SSMS; if the UPDATE is causing a table scan, you need to add an index that will help isolate the records being updated (unless you are updating every single record in the table; that could be a problem).

SqlACID
What if i m using a nosql, object oriented database, that works just in memory.
A: 

Sounds like you'll either want a single file mechanism, or have all of the files consumed out of a shared single directory that continuously checks for the oldest csv file and runs it through your code. That might be the "cheapest" solution, anyway. If you are actually generating more files that you can process, then I'd probably rethink the overall system architecture instead of the 'band-aid' approach.

Jason M
A: 

You may try to take care of concurrency issues at level of your app code and force dbms not to lock objects during updates.

(In RDBMS you would set the lowest transaction isolation level possible (read uncommitted))

Provided you can do that, another option is to truncate all old objects and bulk-insert new values.

Paul