views:

191

answers:

3

What I have is two files, sourcecolumns.txt and destcolumns.txt. What I need to do is compare source to dest and if the dest doesn't contain the source value, write it out to a new file. The code below works except I have case sensitive issues like this:

source: CPI
dest: Cpi

These don't match because of captial letters, so I get incorrect outputs. Any help is always welcome!

string[] sourcelinestotal =
    File.ReadAllLines("C:\\testdirectory\\" + "sourcecolumns.txt");
string[] destlinestotal =
    File.ReadAllLines("C:\\testdirectory\\" + "destcolumns.txt");

foreach (string sline in sourcelinestotal)
{
    if (destlinestotal.Contains(sline))
    {
    }
    else
    {
        File.AppendAllText("C:\\testdirectory\\" + "missingcolumns.txt", sline);
    }
}
A: 

If you do not need case sensitivity, convert your lines to upper case using string.ToUpper before comparison.

Danvil
No! This doesn't pass the *Turkey Test*. http://www.codinghorror.com/blog/2008/03/whats-wrong-with-turkey.html
dtb
Very inefficient - it's much better to use the comparison flags, as others suggest.
500 - Internal Server Error
+3  A: 

Use an extension method for your Contains. A brilliant example was found here on stack overflow Code isn't mine, but I'll post it below.

public static bool Contains(this string source, string toCheck, StringComparison comp) 
{
    return source.IndexOf(toCheck, comp) >= 0;
}

string title = "STRING";
bool contains = title.Contains("string", StringComparison.OrdinalIgnoreCase);
StyxRiver
This doesn't solve the problem -- he wants to see if the **collection** of strings contains a particular string in a case-insensitive manner. This only checks if a **string** contains another string in a case-insensitive manner. You'd need to have an extension method on `IEnumerable<string>`, not `string`.
tvanfosson
I saw your answer, much better than mine. I honestly hadn't considered the performance aspect, and I was unaware of the HashSet's overloaded constructor.This is always a nice extension to have, in either case!
StyxRiver
+3  A: 

You could do this using an extension method for IEnumerable<string> like:

public static class EnumerableExtensions
{
    public static bool Contains( this IEnumerable<string> source, string value, StringComparison comparison )
    {
         if (source == null)
         {
             return false; // nothing is a member of the empty set
         }
         return source.Any( s => string.Equals( s, value, comparison ) );
    }
}

then change

if (destlinestotal.Contains( sline ))

to

if (destlinestotal.Contains( sline, StringComparison.OrdinalIgnoreCase ))

However, if the sets are large and/or you are going to do this very often, the way you're going about it is very inefficient. Essentially, you're doing an O(n2) operation -- for each line in the source you compare it with, potentially, all lines in the destination. It would be better to create a HashSet from the destination columns with a case insenstivie comparer and then iterate through your source columns checking if each one exists in the HashSet of the destination columns. This would be an O(n) algorithm. note that Contains on the HashSet will use the comparer you provide in the constructor.

string[] sourcelinestotal = 
    File.ReadAllLines("C:\\testdirectory\\" + "sourcecolumns.txt"); 
HashSet<string> destlinestotal = 
                new HashSet<string>(
                  File.ReadAllLines("C:\\testdirectory\\" + "destcolumns.txt"),
                  StringComparer.OrdinalIgnoreCase
                );

foreach (string sline in sourcelinestotal) 
{ 
    if (!destlinestotal.Contains(sline)) 
    { 
        File.AppendAllText("C:\\testdirectory\\" + "missingcolumns.txt", sline); 
    } 
}

In retrospect, I actually prefer this solution over simply writing your own case insensitive contains for IEnumerable<string> unless you need the method for something else. There's actually less code (of your own) to maintain by using the HashSet implementation.

tvanfosson
@aba - in the general case, the collection might contain the empty string, though perhaps in this case not.
tvanfosson
I cant get this to compileUsing the generic type 'System.Collections.Generic.HashSet<T>' requires '1' type arguments C
Mike
I left out the type specifier on the HashSet constructor. I've fixed this.
tvanfosson