tags:

views:

47

answers:

3

Need a bit of help, I have two sources of information and the information is exported to two different CSV file's by different programs. They are supposed to include the same information, however this is what needs to be checked.

Therefore what I would like to do is as follows:

  • Take the information from the two files.
  • Compare
  • Output any differences and which file the difference was in. (e.g File A Contained this, but File B did not and vice versa).

The files are 200,000 odd rows so will need to be as effective as possible.

Tried doing this with Excel however has proved to be too complicated and I'm really struggling to find a way programatically.

A: 

Assuming that the files are really supposed to be identical, right down to text qualifiers, ordering of rows, and number of rows contained in each file, the simplest approach may be to simply iterate through both files together and compare each line.

using (StreamReader f1 = new StreamReader(path1))
using (StreamReader f2 = new StreamReader(path2)) {

    var differences = new List<string>();

    int lineNumber = 0;

    while (!f1.EndOfStream) {
        if (f2.EndOfStream) {
           differences.Add("Differing number of lines - f2 has less.");
           break;
        }

        lineNumber++;
        var line1 = f1.ReadLine();
        var line2 = f2.ReadLine();

        if (line1 != line2) {
           differences.Add(string.Format("Line {0} differs. File 1: {1}, File 2: {2}", lineNumber, line1, line2);
        }
    }

    if (!f2.EndOfStream) {
         differences.Add("Differing number of lines - f1 has less.");
    }
}
Ryan Brunner
Thanks, but just added a bit more detail, therefore this won't work.Thanks anyway!
Vibralux
+1  A: 

Depending on your answers to the comments on your question, if it doesn't really need to be done with code, you could do worse than download a compare tool, which is likely to more sophisticated.

(Winmerge for example)

Benjol
A: 

OK, for anyone else that googles this and finds this. Here is what my answer was.

I exported the details to a CSV and ordered them numerically when they were exported for ease of use. Once they were exported as two CSV files, I then used a program called Beyond Compare which can be found here. This allows the files to be compared.

At first I used Beyond Compare manually to test what I was exporting was correct etc, however Beyond Compare does have the ability to be able to use command lines to compare. This then results in everything done programatically, all that has to be done is a user views the results in Beyond Compare. You may be able to export them to another CSV, I havn't looked as the GUI of Beyond Compare is very nice and useful, so it is easier to use this.

Vibralux