I'm writing a program as follows:
- Find all files with the correct extension in a given directory
- Foreach, find all occurrences of a given string in those files
- Print each line
I'd like to write this in a functional way, as a series of generator functions (things that call yield return
and only return one item at a time lazily-loaded), so my code would read like this:
IEnumerable<string> allFiles = GetAllFiles();
IEnumerable<string> matchingFiles = GetMatches( "*.txt", allFiles );
IEnumerable<string> contents = GetFileContents( matchingFiles );
IEnumerable<string> matchingLines = GetMatchingLines( contents );
foreach( var lineText in matchingLines )
Console.WriteLine( "Found: " + lineText );
This is all fine, but what I'd also like to do is print some statistics at the end. Something like this:
Found 233 matches in 150 matching files. Scanned 3,297 total files in 5.72s
The problem is, writing the code in a 'pure functional' style like above, each item is lazily loaded.
You only know how many files match in total until the final foreach loop completes, and because only one item is ever yield
ed at a time, the code doesn't have any place to keep track of how many things it's found previously. If you invoke LINQ's matchingLines.Count()
method, it will re-enumerate the collection!
I can think of many ways to solve this problem, but all of them seem to be somewhat ugly. It strikes me as something that people are bound to have done before, and I'm sure there'll be a nice design pattern which shows a best practice way of doing this.
Any ideas? Cheers