Imagine that I have a C# application that edits text files. The technique employed for each file can be either:
1) Read the file at once in to a string, make the changes, and write the string over the existing file:
string fileContents = File.ReadAllText(fileName);
// make changes to fileContents here...
using (StreamWriter writer = new StreamWriter(fileName))
{
writer.Write(fileContents);
}
2) Read the file by line, writing the changes to a temp file, then deleting the source and renaming the temp file:
using (StreamReader reader = new StreamReader(fileName))
{
string line;
using (StreamWriter writer = new StreamWriter(fileName + ".tmp"))
{
while (!reader.EndOfStream)
{
line = reader.ReadLine();
// make changes to line here
writer.WriteLine(line);
}
}
}
File.Delete(fileName);
File.Move(fileName + ".tmp", fileName);
What are the performance considerations with these options?
It seems to me that either reading by line or reading the entire file at once, the same quantity of data will be read, and disk times will dominate the memory alloc times. That said, once a file is in memory, the OS is free to page it back out, and when it does so the benefit of that large read has been lost. On the other hand, when working wit a temporary file, once the handles are closed I need to delete the old file and rename the temp file, which incurs a cost.
Then there are questions around caching, and prefetching, and disk buffer sizes...
I am assuming that in some cases, slurping the file is better, and in others, operating by line is better. My question is, what are the conditions for these two cases?