views:

5499

answers:

8

Given a text file, how would I go about reading an arbitrary line and nothing else in the file?

Say, I have a file test.txt. How would I go about reading line number 15 in the file?

All I've seen is stuff involving storing the entire text file as a String array and then using the value of the line number as the number of the String to use from the array... but there are some complications: The text file is enormously huge and the machine that the application I'm coding isn't exactly a top-notch system. Speed isn't the top priority, but it is definitely a major issue.

Are there any ways to ONLY read a specific line of a text file and store the result as a string?

Thanks for your responses: The file is KINDA structured. It's got 25 lines of info and then X lines of numbers but line 17 of the first 25 has the value of X.

But then, there's 1 blank line and it repeats itself all over as a second record in the file and X can have a different value for each record.

What I want to do is read and store the first 25 lines as independent values and then store the next X (usually around 250) lines as an array. Then I'm going to store it in an SQL database and repeat with the NEXT record until I reach the Yth record (the number of records in the file is in line 3)

EDIT 2: Alright, I think I've gotten to a solution based on a combination of your alls' responses.

I'm going to read the first 25 lines and store it as an array. I'll copy the pertinent contents of the array to local variables then I'll delete the first 25 lines. Then, I can use the info to store the next X lines (the value of item 13 in the array) as an array, serialize it, store it in a database then delete the lines that I just read.

I could then repeat the process for each subsequent record.

Of course, this relies on one assumption I'm making, which to be honest, I'm not sure is true. Is it possible to delete the first n lines from a text file from within C# without having to read the entire thing and re-write it without the first n lines?

+7  A: 

Since you can't predict the location (can you?) of the i-th line in the file, you'll have to read all previous lines too. If the line number is small, this can be more efficient than the ReadAllLines method.

string GetLine(string fileName, int line)
{
   using (var sr = new StreamReader(fileName)) {
       for (int i = 1; i < line; i++)
          sr.ReadLine();
       return sr.ReadLine();
   }
}
Mehrdad Afshari
Unless your text file is structured with a fixed line length.
Eric J.
... fixed in *bytes* rather than just characters...
Jon Skeet
Not necessarily fixed, also works if it's *predictable*.
Mehrdad Afshari
Mehrdad, could you explain? Reading line 1234 with UTF-8 ?
Henk Holterman
@Mehrdad: Very true - predictable without looking at any data in the file though.
Jon Skeet
Henk: ASCII is not the only fixed-length encoding. There are plenty of them, say, UTF16.
Mehrdad Afshari
@Henk: Supposing each line didn't have the same length, but used one more byte than the line before (with the first line taking 3 bytes, say). Then you could easily predict where line 1234 started, even though each line had a different length.
Jon Skeet
@Merhdad: UTF16 is not fixed-length, but its predecessor UCS-2 is (I only got to know this difference recently myself). See http://en.wikipedia.org/wiki/UTF-16/UCS-2
0xA3
Jon, you are right but such a format is very rare. But I thought Mehrdad had some fancy way to predict escape sequences.
Henk Holterman
@divo: Really? Actually, I was thinking UCS2 and UTF16 are identical. Heh. I was nearly sure since a professor also said that in college. One more reason not to trust what people at academia say (esp. when they are known not to be talented enough) ;) Thanks for mentioning that. That doesn't change the point that ASCII is not the only fixed-length encoding though.
Mehrdad Afshari
@Henk: what I had in mind from being predictable is, say (assume a fixed length encoding) line `i` always has `i` characters. I'll be pretty easy to calculate the start position of line `k`.
Mehrdad Afshari
@divo: By the way, do you know which one .NET `System.String` uses? I'd expected it to be a fixed-length encoding... so is it UCS-2 or really UTF16 (documentation says UTF16, of course)
Mehrdad Afshari
@Mehrdad: The C# spec says: "Character and string processing in C# uses Unicode encoding. The char type represents a UTF-16 code unit, and the string type represents a sequence of UTF-16 code units." although it might surprising that there is no fixed-length encoding used.
0xA3
...(continued) This article explains why: http://www.unicode.org/notes/tn12/
0xA3
@divo: That article explains it: "UTF-16 already allows **fixed-width processing** of BMP characters." although it doesn't mention how. It would be really surprising if .NET used an encoding that didn't support fixed-width processing.
Mehrdad Afshari
+2  A: 

Unless you have fixed sized lines, you need to read every line until you reach the line you want. Although, you don't need to store each line, just discard it if it's not the line you desire.

Edit:

As mentioned, it would also be possible to seek in the file if the line lengths were predictable -- that is to say you could apply some deterministic function to transform a line number into a file position.

Ron Warholic
+1, but like Jon Skeet remarked you need fixed size in bytes, which implies ASCII encoding.
Henk Holterman
Henk: How come fixed in bytes implies ASCII?
Mehrdad Afshari
Indeed: UTF-32 springs to mind, or even things like ISO-8859-1.
Jon Skeet
With UTF-8 escape sequences are used for special chars, ie Ă takes (a few) more bytes than A.
Henk Holterman
Ok, you would need a fixed-size encoding (ASCII, UTF-32, ISO-8859-1, ..) combined with a fixed line length.
Henk Holterman
A: 

You could read line by line so you don't have to read the entire all at once (probably at all)

int i=0
while(!stream.eof() && i!=lineNum)
    stream.readLine()
    i++
line = stream.readLine()
Samuel
The question is tagged C#.
Mehrdad Afshari
The problem with reading line by line is you will have a latency and seek with each read. If the file has a LOT of lines the performance will go through the floor. Reading large blocks (say 64k or more) of data and looking for the line breaks in memory will have MUCH better performance.
RB Davidson
RB Davidson: If the stream is buffered, that would be a non-issue.
Mehrdad Afshari
Not all streams are buffered. You can force a stream to be buffered, but nothing in your example implies this is being done. In any event, I have encountered performace issues when reading really big files using buffered streams and reading line by line. I was able to significantly increased performance by forcing the stream to read larger blocks of data than single lines, then splitting out the lines in memory.
RB Davidson
+2  A: 

No unfortunately there is not. At the raw level files do not work on a line number basis. Instead they work at a position / offset basis. The root filesystem has no concept of lines. It's a concept added by higher level components.

So there is no way to tell the operating system, please open file at line blah. Instead you have to open the file and skip around counting new lines until you've passed the specified number. Then store the next set of bytes into an array until you hit the next new line.

JaredPar
A: 

If each line is a fixed length then you can open a Stream around it, seek (bytes per line) * n into the file and read your line from there.

using( Stream stream = File.Open(fileName, FileMode.Open) )
{
    stream.Seek(bytesPerLine * (myLine - 1), SeekOrigin.Begin);
    using( StreamReader reader = new StreamReader(stream) )
    {
        string line = reader.ReadLine();
    }
}

Alternatively you could just use the StreamReader to read lines until you found the one you wanted. That way's slower but still an improvement over reading every single line.

using( Stream stream = File.Open(fileName, FileMode.Open) )
{
    using( StreamReader reader = new StreamReader(fileStream) )
    {
        string line = null;
        for( int i = 0; i < myLineNumber; ++i )
        {
            line = reader.ReadLine();
        }
    }
}
Dave
A: 

As Mehrdad said, you cannot just seek to the n-th line without reading the file. However, you don't need to store the entire file in memory - just discard the data you don't need.

string line;
using (StreamReader sr = new StreamReader(path))
    for (int i = 0; i<15; i++)
    {
       line = sr.ReadLine();
       if (line==null) break; // there are less than 15 lines in the file
    }
VladV
A: 

If the lines are all of a fixed length you can use the Seek method of a stream to move to the correct starting positiion.

If the lines are of a variable length your options are more limited.

If this is a file you will be only using once and then discarding, then you are best off reading it in and working with it in memeory.

If this is a file you will keeping and will be reading from more than writing to, you can create a custom index file that contains the starting positions of each line. Then use that index to get your Seek position. The process of creating the index file is resource intensive. Everytime you add a new line to the file you will need to update the index, so maintenance becomes a non-trivial issue.

RB Davidson
A: 

If your file contains lines with different lengths and you need to read lines often and you need to read it quickly you can make an index of the file by reading it once, saving position of each new line and then when you need to read a line, you just lookup the position of the line in your index, seek there and then you read the line.

If you add new lines to the file you can just add index of new lines and you don't need to reindex it all. Though if your file changes somewhere in a line you have already indexed then you have to reindex.

Tomáš Klapka