views:

752

answers:

6

I have a text file that is being written to as part of a very large data extract. The first line of the text file is the number of "accounts" extracted.

Because of the nature of this extract, that number is not known until the very end of the process, but the file can be large (a few hundred megs).

What is the BEST way in C# / .NET to open a file (in this case a simple text file), and replace the data that is in the first "line" of text?

IMPORTANT NOTE: - I do not need to replace a "fixed amount of bytes" - that would be easy. The problem here is that the data that needs to be inserted at the top of the file is variable.

IMPORTANT NOTE 2: - A few people have asked about / mentioned simply keeping the data in memory and then replacing it... however that's completely out of the question. The reason why this process is being updated is because of the fact that sometimes it crashes when loading a few gigs into memory.

+1  A: 

If the extracted file is only a few hundred megabytes, then you can easily keep all of the text in-memory until the extraction is complete. Then, you can write your output file as the last operation, starting with the record count.

Tim Long
"only a few hundred megabytes" ??? Are you serious ?
Cerebrus
I have only 2 Gigs on my machine -- most of the others in our office have between 4 and 8. What's 200MB. Maybe 10% of the total memory...
Jack Bolding
And what happens in a years time when the file is "only a few gigabytes", you going to keep it all in memory then too?
Binary Worrier
Should I be using time now to worry about what happens in two years? In a couple of years, I would expect to be running a quadproc x64 machine with a minimum of 8GB of RAM. Why couldn't I keep it in memory?
Jack Bolding
Spending time on unnecessary optimization is a waste of time. Do the simple thing now, then if the situation changes "in two years" upgrade the computer's memory. Have you seen the cost of memory lately? They're giving it away free (well, almost).
Tim Long
The amount of physical memory is not relevant. What matters with very large allocations is the size of the process's address space. In a 32-bit process this is 2 GB by default. So a 200 MB file is 10% of the entire address space. It's a very large amount to allocate without thinking seriously about it. In the CLR it will come from the large object heap, which is not compacted, which means fragmentation. If you write a 64-bit program, it's another matter, but you may find performance suffers in other ways as a result of pointers doubling in size.
Daniel Earwicker
+4  A: 

If you can you should insert a placeholder which you overwrite at the end with the actual number and spaces.

If that is not an option write your data to a cache file first. When you know the actual number create the output file and append the data from the cache.

chris
Yes, the only way to avoid writing the data twice. If it's text based there should be no problem, just reserve a decent amount of spaces first.
Henk Holterman
This is what I would *like* to do (reserve some blank space) - the only problem is that the file-format that I'm writing to requires exact #####\r\n (meaning no padding). - Good answer though.
Timothy Khouri
@Timothy: does it allow leading zeros?
Henk Holterman
@Henk - No on the leading zeros - "Binary Worrier" suggested a good solution if that was acceptable though.
Timothy Khouri
+1  A: 

I do not need to replace a "fixed amount of bytes"

Are you sure? If you write a big number to the first line of the file (UInt32.MaxValue or UInt64.MaxValue), then when you find the correct actual number, you can replace that number of bytes with the correct number, but left padded with zeros, so it's still a valid integer. e.g.

Replace  999999 - your "large number placeholder"
With     000100 - the actual number of accounts
Binary Worrier
Clever workaround! - However the file spec that I'm working with won't accept that... very good thought though :)
Timothy Khouri
Do you mind me asking why not?
Binary Worrier
It's a file spec, it didn't answer my question :P
Timothy Khouri
"It's a file spec" doesn't tell me anything. Can you include a snippet of the spec that defines what that count should be? I'm sorry but I'm having a hard time imagining something that can't work with leading zeros. It doesn't matter, it's purely for my own edification. Thanks mate.
Binary Worrier
+3  A: 

BEST is very subjective. For any smallish file, you can easily open the entire file in memory and replace what you want using a string replace and then re-write the file.

Even for largish files, it would not be that hard to load into memory. In the days of multi-gigs of memory, I would consider hundreds of megabytes to still be easily done in memory.

Have you tested this naive approach? Have you seen a real issue with it?

If this is a really large file (gigabytes in size), I would consider writing all of the data first to a temp file and then write the correct file with the header line going in first and then appending the rest of the data. Since it is only text, I would probably just shell out to DOS:

 TYPE temp.txt >> outfile.txt
Jack Bolding
+1  A: 

Seems to me if I understand the question correctly?

What is the BEST way in C# / .NET to open a file (in this case a simple text file), and replace the data that is in the first "line" of text?

How about placing at the top of the file a token {UserCount} when it is first created.

Then use TextReader to read the file line by line. If it is the first line look for {UserCount} and replace with your value. Write out each line you read in using TextWriter

Example:

    int lineNumber = 1;
 int userCount = 1234;
 string line = null;

 using(TextReader tr = File.OpenText("OriginalFile"))
 using(TextWriter tw = File.CreateText("ReslultFile"))
 {

  while((line = tr.ReadLine()) != null)
  {
   if(lineNumber == 1)
   {
    line = line.Replace("{UserCount}", userCount.ToString());
   }

   tw.WriteLine(line);
   lineNumber++;
  }

 }
Jim Scott
This is essentially what I had to do, but my goal was to *not* have to create 2 files.
Timothy Khouri
I have one more solution that I seen but have not verified or tried yet. Basically what you do is use something like StreamWriter stream to write your first file and keep it open. Also write as I suggested the placeholder and keep the start and end point of the token. So now that you are at the end of the file and you have the UserCount and just need to go back and replace the token with your value. To do that you use a BitStream which I believe you can get to by accessing the StreamWriter.BaseStream and can write bytes to specific location in your stream. Will try and test it out and post.
Jim Scott
A: 

Ok, earlier I suggested an approach that would be a better if dealign with existing files.

However in your situation you want to create the file and during the create process go back to the top and write out the user count. This will do just that.

Here is one way to do it that prevents you having to write the temporary file.

    private void WriteUsers()
 { 
  string userCountString = null;
  ASCIIEncoding enc = new ASCIIEncoding();
  byte[] userCountBytes = null;
  int userCounter = 0;

  using(StreamWriter sw = File.CreateText("myfile.txt"))
  {
   // Write a blank line and return
   // Note this line will later contain our user count.
   sw.WriteLine();

   // Write out the records and keep track of the count 
   for(int i = 1; i < 100; i++)
   {
    sw.WriteLine("User" + i);
    userCounter++;
   }

   // Get the base stream and set the position to 0
   sw.BaseStream.Position = 0;

   userCountString = "User Count: " + userCounter;

   userCountBytes = enc.GetBytes(userCountString);

   sw.BaseStream.Write(userCountBytes, 0, userCountBytes.Length);
  }

 }
Jim Scott