views:

269

answers:

4

I have to read line-by-line a log file. It's about 6MB in size and 40000 line total. But after testing my program, I discover that that log file is only delimited by LF character only. So I can't use the Readline method of StreamReader class

How can I fix this problem?

edit: I tried to use Text Reader, but my program still didn't work:

using (TextReader sr = new StreamReader(strPath, Encoding.Unicode))
            {


                sr.ReadLine(); //ignore three first lines of log file
                sr.ReadLine(); 
                sr.ReadLine();

                int count = 0; //number of read line
                string strLine;
                while (sr.Peek()!=0)
                {
                    strLine = sr.ReadLine();
                    if (strLine.Trim() != "")
                    {
                        InsertData(strLine);
                        count++;
                    }
                }

                return count;
            }
+2  A: 

Does File.ReadAllLines(fileName) not correctly load files with LF line ends? Use this if you need the whole file - I saw a site indicating it's slower than another method, but it's not if you pass the correct Encoding to it (default is UTF-8), plus it's as clean as you can get.

Edit: It does. And if you need streaming, TextReader.ReadLine() correctly handles Unix line ends as well.

Edit again: So does StreamReader. Did you just check the documentation and assume it won't handle LF line ends? I'm looking in Reflector and it sure seems like a proper handling routine.

280Z28
the file is big, actually. and I have to read line-by-line for post-prcessing
Vimvq1987
+1  A: 

TextReader.ReadLine already handles lines terminated just by \n.

From the docs:

A line is defined as a sequence of characters followed by a carriage return (0x000d), a line feed (0x000a), a carriage return followed by a line feed, Environment.NewLine, or the end of stream marker. The string that is returned does not contain the terminating carriage return and/or line feed. The returned value is a null reference (Nothing in Visual Basic) if the end of the input stream has been reached.

So basically, you should be fine. (I've talked about TextReader rather than StreamReader because that's where the method is declared - obviously it will still work with a StreamReader.)

If you want to iterate through lines easily (and potentially use LINQ against the log file) you may find my LineReader class in MiscUtil useful. It basically wraps calls to ReadLine() in an iterator. So for instance, you can do:

var query = from file in Directory.GetFiles("logs")
            from line in new LineReader(file)
            where !line.StartsWith("DEBUG")
            select line;

foreach (string line in query)
{
    // ...
}

All streaming :)

Jon Skeet
my program still didn't work. I don't know what's wrong :(
Vimvq1987
A: 

I'd have guessed \LF (\n) would be fine (whereas \CR (\r) -only might cause problems).

You could read each line a character at a time and process it when you read the terminator.

After profiling, if this is too slow, then you could use app-side-buffering with read([]). But try simple character-at-a-time first!

Will
There are fast functions that implement this functionality. Definitely try those first, since they are fast, short, expressive, and standardized.
280Z28
A: 

Or you can use the Readblock Method and parse the lines yourself

Marcom