tags:

views:

147

answers:

6

Hi,

I am looking for a way to check if the "foo" word is present in a text file using C#.

I may use a regular expression but I'm not sure that is going to work if the word is splitted in two lines. I got the same issue with a streamreader that enumerates over the lines.

Any comments ?

+3  A: 

What's wrong with a simple search?

If the file is not large, and memory is not a problem, simply read the entire file into a string (ReadToEnd() method), and use string Contains()

Mitch Wheat
will this handle the case where it is split over consecutive lines?
Sam Holder
+2  A: 

Here is a quick quick example using LINQ

    static void Main(string[] args)
    {
        { //LINQ version
            bool hasFoo = "file.txt".AsLines()
                                    .Any(l => l.Contains("foo"));
        }
        { // No LINQ or Extension Methods needed
            bool hasFoo = false;
            foreach (var line in Tools.AsLines("file.txt"))
                if (line.Contains("foo"))
                {
                    hasFoo = true;
                    break;
                }
        }
    }
}
public static class Tools
{
    public static IEnumerable<string> AsLines(this string filename)
    {
        using (var reader = new StreamReader(filename))
            while (!reader.EndOfStream)
            {
                var line = reader.ReadLine();
                while (line.EndsWith("-") && !reader.EndOfStream)
                    line = line.Substring(0, line.Length - 1)
                                + reader.ReadLine();
                yield return line;
            }
    }
}
Matthew Whited
didn't see the "splitted" comment. you could add check if the last character in the line is a - then remove it and join the two lines together before you check for the word
Matthew Whited
Note: If you're using .NET 4.0 you can use File.ReadLines(filename) instead of having to write the AsLines method.
ICR
@ICR: Good point, I was writing and testing this in 2008 against .Net 3.5.
Matthew Whited
A: 

You don't need regular expressions in a case this simple. Simply loop over the lines and check if it contains foo.

using (StreamReader sr = File.Open("filename", FileMode.Open, FileAccess.Read))
{
    string line = null;
    while (!sr.EndOfStream) {
        line = sr.ReadLine();
        if (line.Contains("foo"))
        {
            // foo was found in the file
        }
    }
}
Aistina
+1  A: 

Here ya go. So we look at the string as we read the file and we keep track of the first word last word combo and check to see if matches your pattern.

string pattern = "foo";
string input = null;
string lastword = string.Empty;
string firstword = string.Empty;
bool result = false;

FileStream FS = new FileStream("File name and path", FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader SR = new StreamReader(FS);

while ((input = SR.ReadLine()) != null) 
{
    firstword = input.Substring(0, input.IndexOf(" "));
    if(lastword.Trim() != string.Empty) { firstword = lastword.Trim() + firstword.Trim(); } 

    Regex RegPattern = new Regex(pattern);
    Match Match1 = RegPattern.Match(input);
    string value1 = Match1.ToString(); 

    if (pattern.Trim() == firstword.Trim() || value1 != string.Empty) { result = true;  }

    lastword = input.Trim().Substring(input.Trim().LastIndexOf(" "));
}
Ioxp
I thought about this as well... but you would have a problem if you have something like "barf oogle" in the file.
Matthew Whited
Why would the input file have a brake in the word? where "f" is on one line and "oo" is the beginning of the next?
Ioxp
Well I really don't think that foo is the word he is really searching for. My point was that if you do a .Replace(" ", "") it would join all words together.
Matthew Whited
Updated the code with some checkup logic to see if the End of previous line + beginning of new line trimmed = pattern. as well as using the regex since thats what was asked for in the solution. I understand the placement of the regex is not optimal but when @toto figures out how he wants to use it he can adjust.
Ioxp
+1  A: 

What about if the line contains football? Or fool? If you are going to go down the regular expression route you need to look for word boundaries.

Regex r = new Regex("\bfoo\b");

Also ensure you are taking into consideration case insensitivity if you need to.

rrrr
this is a good point. to use my above sample you could add a space to the begining and end of each line then do a .Contains(" foo ")
Matthew Whited
A: 

You could construct a regex which allows for newlines to be placed between every character.

private static bool IsSubstring(string input, string substring)
{
    string[] letters = new string[substring.Length];
    for (int i = 0; i < substring.Length; i += 1)
    {
        letters[i] = substring[i].ToString();
    }
    string regex = @"\b" + string.Join(@"(\r?\n?)", letters) + @"\b";
    return Regex.IsMatch(input, regex, RegexOptions.ExplicitCapture);
}
ICR