In my opinion this is an excellent opportunity to use the StringReader
class:
- Read your text line by line.
- Keep your lines in some kind of buffer (e.g., a
Queue<string>
), dropping lines you don't need after a given number of lines have been read.
- Once your "needle" is found, read one more line (if possible) and then just return what's in your buffer.
In my opinion, this has some advantages over the other approaches suggested:
- Since it doesn't utilize
String.Split
, it doesn't do more work than you need -- i.e., reading the entire string looking for the characters to split on, and creating an array of the substrings.
- In fact, it doesn't necessarily read the entire string at all, since once it finds the text it's looking for it only goes as far as necessary to get the desired number of padding lines.
- It could even be refactored (very easily) to be able to deal with any textual input via a
TextReader
-- e.g., a StreamReader
-- so it could even work with huge files, without having to load the entire contents of a given file into memory.
Imagine this scenario: you want to find an excerpt of text from a text file that contains the entire text from a novel. (Not that this is your scenario -- I'm just speaking hypothetically.) Using String.Split
would require that the entire text of the novel be split according to the delimiter you specified, whereas using a StringReader
(well, in this case, a StreamReader
) would only require reading until the desired text was found, at which point the excerpt would be returned.
Again, I realize this isn't necessarily your scenario -- just suggesting that this approach provides scalability as one of its strengths.
Here's a quick implementation:
// rearranged code to avoid horizontal scrolling
public static string FindSurroundingLines
(string haystack, string needle, int paddingLines) {
if (string.IsNullOrEmpty(haystack))
throw new ArgumentException("haystack");
else if (string.IsNullOrEmpty(needle))
throw new ArgumentException("needle");
else if (paddingLines < 0)
throw new ArgumentOutOfRangeException("paddingLines");
// buffer needs to accomodate paddingLines on each side
// plus line containing the needle itself, so:
// (paddingLines * 2) + 1
int bufferSize = (paddingLines * 2) + 1;
var buffer = new Queue<string>(/*capacity*/ bufferSize);
using (var reader = new StringReader(haystack)) {
bool needleFound = false;
while (!needleFound && reader.Peek() != -1) {
string line = reader.ReadLine();
if (buffer.Count == bufferSize)
buffer.Dequeue();
buffer.Enqueue(line);
needleFound = line.Contains(needle);
}
// at this point either the needle has been found,
// or we've reached the end of the text (haystack);
// all that's left to do is make sure the string returned
// includes the specified number of padding lines
// on either side
int endingLinesRead = 0;
while (
(reader.Peek() != -1 && endingLinesRead++ < paddingLines) ||
(buffer.Count < bufferSize)
) {
if (buffer.Count == bufferSize)
buffer.Dequeue();
buffer.Enqueue(reader.ReadLine());
}
var resultBuilder = new StringBuilder();
while (buffer.Count > 0)
resultBuilder.AppendLine(buffer.Dequeue());
return resultBuilder.ToString();
}
}
Some example input/output (with text
containing your example input):
Code:
Console.WriteLine(FindSurroundingLines(text, "MOUSE", 1);
Output:
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
This is the 4th line of DOG text in the paragraph
Code:
Console.WriteLine(FindSurroundingLines(text, "BIRD", 1);
Output:
This is the 1st line of CAT text in the paragraph
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
Code:
Console.WriteLine(FindSurroundingLines(text, "DOG", 0);
Output:
This is the 4th line of DOG text in the paragraph
Code:
Console.WriteLine(FindSurroundingLines(text, "This", 2);
Output:
This is the 1st line of CAT text in the paragraph
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
This is the 4th line of DOG text in the paragraph
This is the 5th line of RABBIT text in the paragraph