views:

237

answers:

4

Using C#, I want to generate 1,000,000 files from DB, each record in separate file. What is the best way to generate this files in minimum time?

Here is my code without threading :

AppDomain.CurrentDomain.ProcessExit += new EventHandler(CurrentDomain_ProcessExit); // to calculate the execution time in case of using threading
    SqlCommand cmd = new SqlCommand(@"select top 1000000 p1+','+p2+','+p3+','+p4 as line from lines  ", con);

    con.Open();
    var rdr = cmd.ExecuteReader();
    int i = 0;
    while (rdr.Read())
    {

        string line = rdr.Getring(0);
        string filename = String.Format("file{0}.txt", ++i);
        File.WriteAllText(filename, line);

    }
    rdr.Close();
    con.Close();
+3  A: 

Since your operations are IO bound and not CPU bound, the best way is to have 2 threads, one that reads from DB the records and put it into a queue, the other read from the queue and generate the files.

Alternatively, you can use the CLR thread pool for that, something like

while (rdr.Read())
    {

        string line = rdr.Getring(0);
        ThreadPool.QueueUserWorkItem (new WaitCallback(writeData), line);

    }

and writeData would look like

static void writeData(Object line)
{
            string filename = String.Format("file{0}.txt", ++i);
            File.WriteAllText(filename, line);
}

The disadvantage of using the ThreadPool is you could end up more threads than you want, since your threads will be blocked in IO most of the time, the thread pool will create new threads to service your requests.

You can try the thread pool first and measure the performance, if you are not satisfied, you can try the 2 threads, 1 queue approach; well known as Producer/Consumer problem.

mfawzymkh
Using thread pool generates only 2713 files and exit my application.unexpected behavior.
Ammroff
A: 

You would benefit from having more threads; the best way to figure out the exact number is empirically, but don't limit yourself to one per CPU core the way you might with CPU-bound tasks. The very easiest way is to use a ThreadPool, but a Producer/Consumer queuing system would be more flexible and tunable.

Steven Sudit
A: 

Why not use SSIS package? Isn't it supposed to do these kind of things?

danish
Do you have any article about SSIS package to generate files?
Ammroff
A: 

This might help.

danish
this works great if your database is only SQL Server:) I hope the majority of people are using:)
mfawzymkh
Although he has not shown the connection string, use of SQLCommand suggests he is using SQL as database. Hence I guess this can be used.
danish
Yes , I use SQLserver 2008 , and I can't find a way to generate each record into a single file using SSIS,Can any one help?
Ammroff
If I use only the script task in SSIS ,is this equivalent to write a code in c# , or it will be better in performance?
Ammroff