views:

6926

answers:

6

I need to delete a certain line from a text file. What is the most efficient way of doing this? File can be potentially large(over million records).

UPDATE: below is the code I'm currently using, but I'm not sure if it is good.

internal void DeleteMarkedEntries() {
    string tempPath=Path.GetTempFileName();
    using (var reader = new StreamReader(logPath)) {
        using (var writer = new StreamWriter(File.OpenWrite(tempPath))) {
            int counter = 0;
            while (!reader.EndOfStream) {
                if (!_deletedLines.Contains(counter)) {
                    writer.WriteLine(reader.ReadLine());
                }
                ++counter;
            }
        }
    }
    if (File.Exists(tempPath)) {
        File.Delete(logPath);
        File.Move(tempPath, logPath);
    }
}
+3  A: 

Text files are sequential, so when deleting a line, you'll have to move all the following lines up. You can use file mapping (a win32 api that you can call through PInvoke) to make this operation a bit less painfull, but you surelly should considere using a non sequential structure for you file so that you can mark a line as deleted without realy removing it from the file... Especially if it should happen frenquently.

If I've remember File Mapping Api should be added to .Net 4.

Think Before Coding
+6  A: 

The most straight forward way of doing this is probably the best, write the entire file out to a new file, writing all lines except the one(s) you don't want.

Alternatively, open the file for random access.

Read to the point where you want to "delete" the line. Skip past the line to delete, and read that number of bytes (including CR + LF - if necessary), write that number of bytes over the deleted line, advance both locations by that count of bytes and repeat until end of file.

Hope this helps.

EDIT - Now that I can see your code

if (!_deletedLines.Contains(counter)) 
{                            
    writer.WriteLine(reader.ReadLine());                        
}

Will not work, if its the line you don't want, you still want to read it, just not write it. The above code will neither read it or write it. The new file will be exactly the same as the old.

You want something like

string line = reader.ReadLine();
if (!_deletedLines.Contains(counter)) 
{                            
    writer.WriteLine(line);                        
}
Binary Worrier
Thank you for pointing to this bug
Valentin Vasiliev
A: 

If you absolutely have to use a text file and cannot switch to a database, maybe you want to designate a wierd symbol at the beginning of a line to mean "line deleted". Just have your parser ignore those lines, like comment lines in config files etc.

Then have a periodic "compact" routine like Outlook, and most database systems do, which re-writes the entire file excluding the deleted lines.

I would strongly go with Think Before Coding's answer recommending a database or other structured file.

Bork Blatt
yes, the requirement is to be able to have a human readable file (but I'm not sure how any human can possible skim through a million lines!). I can't do anything about this requirement.
Valentin Vasiliev
A: 

Move you file to memory using File Mapping, like Think Before Coding did, and made deletions on memory and after write to disk.
Read this File Read Benchmarks - C#
C# accessing memory map file

lsalamon
A: 

Depending on what exactly counts as "deleting", your best solution may be to overwrite the offending line with spaces. For many purposes (including human consumption), this is equivalent to deleting the line outright. If the resulting blank line is a problem, and you are sure you'll never delete the first line, you can append the spaces to the previous line by also overwriting the CRLF with two spaces.

(Based on the comment to Bork Blatt's answer)

MSalters
A: 

Read your file into a Dictionary on non delete lines set the int to 0 on line you need to mark as deleted set int to 1. Use a KeyValuePair to extract the lines that don't needed to be deleted and write them to a new file.

Dictionary<string, int> output = new Dictionary<string, int>();

// read line from file

...

// if need to delete line then set int value to 1

// otherwise set int value to 0
if (deleteLine)
{
    output[line] = 1;
}
else
{
    output[line] = 0;
}

// define the no delete List
List<string> nonDeleteList = new List<string>();

// use foreach to loop through each item in nonDeleteList and add each key
// who's value is equal to zero (0) to the nonDeleteList.
foreach (KeyValuePair<string, int> kvp in output)
{

    if (kvp.Value == 0)

    {

        nonDeleteList.Add(kvp.Key);

    }
}

// write the nondeletelist to the output file
File.WriteAllLines("OUTPUT_FILE_NAME", nonDeleteList.ToArray());

That's it.

willjr20