tags:

views:

337

answers:

7

In C#, what is the best way to get a count of the total number of lines in all the files in a directory and all of its subdirectories?

The obvious answer is to make a recursive function to go through all of the directories and use the strategy from this question to count the lines in each file. Is there a better/easier way?

A: 

I think that post sufficiently explains the latter part of your question. As far as the directory traversing, check out this http://dotnetperls.com/recursively-find-files

UPDATE: there is an abstraction over this: I was really hoping you would read the link, but here it is http://dotnetperls.com/recursive-file-list-1

Evan Carroll
Actually, his question is not how to do it recursively but rather if there is a better alternative.
Jason
there is, that page talks about all of the methods to do this., i linked to the framework way.
Evan Carroll
+1  A: 

There is not really a better way. Walking through a directory structure to all subdirectories inherently lends itself to being done recursively. As for counting the lines in the file, you really have no choice but to open the file up and count the lines. Note that you do need to be aware of blowing up your stack so you might have to manually simulate recursion using a Queue.

Since it's relatively easy to get that method coded up correctly, clearly and concisely I think that is what you should do and move on to adding value elsewhere.

Jason
-1 Thats not an answer to the question, just a rather back handed insult.
Nick
@Nick: It is absolutely not a backhanded insult and it's unfortunate that you take it that way.
Jason
I agree with Nick in sentiment and point.
Evan Carroll
@Evan Carroll: That's fine, again, I don't see the insult at all and I wish someone would be more explicit about how I am being insulting so that can edit accordingly and avoid such unintentional insults in the future.
Jason
I don't see anything insulting in that answer... +1 for mentioning the risk of stack overflow
Thomas Levesque
+6  A: 

Is there a better/easier way?

No, there is (in general) no better way to get the number of lines in a file than by counting them.

In order to find the total number of lines in all files, you will have to get the total number of lines in each file at some point. There's really no way around that.

Anon.
Why the downvote? If there's a reason you think this didn't deserve the upvote, it would be polite to point that out instead of just doing a hit-and-run. Otherwise, the only thing I can think of is you didn't want this above your answer for some reason.
Anon.
+1 because of dumb down vote.
Hogan
+1 because of another dumb down vote. Who are these people?
Hans Passant
A: 

For finding the files, why not just use something like:

Directory.GetFiles("C:/some/path", "*.txt", SearchOption.AllDirectories);

This will give you the results of a recursive search.

Nathan Parrish
If the user does not have permission to enumerate some of the sub directories contained within, this causes an exception. In a general case, this will work.
Nate Bross
A: 

The strategy you described works well. An alternative approach instead of a recursive function (basically DFS) is to use BFS. Something like:

int CountLines(string path)
{
    var queue = new Queue<string>();
    queue.Enqueue(path);
    int count = 0;
    while (queue.Count > 0) {
        string dir = queue.Dequeue();
        foreach (var subdir in Directory.GetDirectories(dir))
            queue.Enqueue(subdir);
        foreach (var file in Directory.GetFiles(dir))
            count += GetLineCount(file); 
    }
    return count;
}
Mehrdad Afshari
+1  A: 

Here's a LINQy way of doing so:

string path = @"C:\TonsOfTextFiles";
int totalLines = (from file in Directory.GetFiles(path, "*.*", SearchOption.AllDirectories)
                    let fileText = File.ReadAllLines(file)
                    select fileText.Length).Sum();
Jesse C. Slicer
I like where you're going with this but your parans are mismatched.
Dinah
Thanks, Dinah. I fixed that up. Copy/paste snafu.
Jesse C. Slicer
+1. (but you should IMHO convert it to "methods" syntax...)
Lette
Nice. It would have been nicer with lambda syntax: `Directory.GetFiles(path, "*.*", SearchOption.AllDirectories).Sum(f => File.ReadAllLines(f).Length)`
Mehrdad Afshari
Thanks, Lette. To be honest, I like methods more myself (only because it seems more C#-like than SQL-like). What's your reasons behind seeing method syntax?
Jesse C. Slicer
Agree with you as well, Mehrdad! I'd even go a step father with .NET 4 and parallelize it: Directory.GetFiles(path, "*.*", SearchOption.AllDirectories).AsParallel().Sum(f => File.ReadAllLines(f).Length)
Jesse C. Slicer
@Jesse: Parallelization is very unlikely to improve performance for such a task (it might even negatively impact it by increasing HDD seeks) as it's bound by the hard disk speed.
Mehrdad Afshari
@Jesse: What you said. :-)
Lette
@Mehrdad: What you did! :-)
Lette
A: 

Please God, forgive me:

@echo off
set sum=0
for /r %%f in (*.cs) do find /v /c "$$some nonsense string$$" %%f >> test.dat
for /f "tokens=3 delims=:" %%i in (test.dat) do set /a sum += %%i
echo total lines = %sum%
del test.dat

Isn't C#, but it's fun.

EDIT: This can be more memory efficient, as it doesn't use ReadAllLines, but one at once:

string basePath = @"C:\some\path";
Console.WriteLine(
    Directory.GetFiles(basePath, "*.cs", SearchOption.AllDirectories)
        .Sum(file => 
        {
            int lines = 0;
            using (StreamReader reader = new StreamReader(file))
                while(reader.ReadLine() != null) lines++;
            return lines;
        }));
Rubens Farias