views:

301

answers:

4

I have a text file that contains about 100000 articles. The structure of file is:

.Document ID 42944-YEAR:5
.Date  03\08\11
.Cat  political
Article Content 1

.Document ID 42945-YEAR:5
.Date  03\08\11
.Cat  political
Article Content 2

I want to open this file in c# for processing it line by line. I tried this code:

String[] FileLines = File.ReadAllText(
                  TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray()); 

But it says:

Exception of type 'System.OutOfMemoryException' was thrown.

The question is How can I open this file and read it line by line.

  • File Size: 564 MB (591,886,626 bytes)
  • File Encoding: UTF-8
  • File contains Unicode characters.
+5  A: 

You can open the file and read it as a stream rather than loading everything into memory all at once.

From MSDN:

using System;
using System.IO;

class Test 
{
    public static void Main() 
    {
        try 
        {
            // Create an instance of StreamReader to read from a file.
            // The using statement also closes the StreamReader.
            using (StreamReader sr = new StreamReader("TestFile.txt")) 
            {
                String line;
                // Read and display lines from the file until the end of 
                // the file is reached.
                while ((line = sr.ReadLine()) != null) 
                {
                    Console.WriteLine(line);
                }
            }
        }
        catch (Exception e) 
        {
            // Let the user know what went wrong.
            Console.WriteLine("The file could not be read:");
            Console.WriteLine(e.Message);
        }
    }
}
Eric J.
+8  A: 

Your file is too large to be read into memory in one go, as File.ReadAllText is trying to do. You should instead read the file line by line.

Adapted from MSDN:

string line;
// Read the file and display it line by line.
using (StreamReader file = new StreamReader(@"c:\yourfile.txt"))
{
    while ((line = file.ReadLine()) != null)
    {    
        Console.WriteLine(line);
        // do your processing on each line here
    }
}

In this way, no more than a single line of the file is in memory at any one time.

Michael Petrotta
+2  A: 

Something like this:

using (var fileStream = File.OpenText(@"path to file"))
{
    do
    {
        var fileLine = fileStream.ReadLine();
        // process fileLine here

    } while (!fileStream.EndOfStream);
}

Jens Granlund
+2  A: 

If you are using .NET Framework 4, there is a new static method on System.IO.File called ReadLines that returns an IEnumerable of string. I believe it was added to the framework for this exact scenario; however, I have yet to use it myself.

MSDN Documentation - File.ReadLines Method (String)

Related Stack Overflow Question - Bug in the File.ReadLines(..) method of the .net framework 4.0

Dan Terry