views:

89

answers:

3

I have a test script that does something to one object then the same thing to a second object. This continues for quite a while. With so much predictable repetition that it seems ripe for automation but I can't figure out how. I wouldn't care so much except with so much repetition, it's easy to overlook using the wrong variable (ie: stagingXyz when prodXyz was intended.)

The details below are irrelevant. What's important is the pattern.

var stagingDbs = cleanupDbs(stagingServer.Databases);
var prodDbs = cleanupDbs(prodServer.Databases);

printDiff(stagingDbs, prodDbs, "Databases mis-matched");

foreach (var db in stagingDbs.Intersect(prodDbs)) {
    var stagingDb = stagingServer.Databases[db];
    var prodDb = prodServer.Databases[db];

    var stagingTables = cleanupTables(stagingDb.Tables);
    var prodTables = cleanupTables(prodDb.Tables);

    printDiff(stagingTables, prodTables, "Tables mis-matched on " + db);

    foreach (var table in stagingTables.Intersect(prodTables)) {
        var stagingTable = stagingDb.Tables[table];
        var prodTable = prodDb.Tables[table];

        var matchedColumns = stagingColumns.Intersect(prodColumns);

        var stagingTableColumns = stagingTable.Columns
            .Cast<Column>()
            .Where(c => matchedColumns.Contains(c.Name))
            .Select(c => formatColumn(c));
        var prodTableColumns = prodTable.Columns
            .Cast<Column>()
            .Where(c => matchedColumns.Contains(c.Name))
            .Select(c => formatColumn(c));
        printDiff(stagingTableColumns, prodTableColumns,
            "Columns mis-matched");
    }
}

I don't want to go through, for instance, replacing this

        var stagingTableColumns = stagingTable.Columns
            .Cast<Column>()
            .Where(c => matchedColumns.Contains(c.Name))
            .Select(c => formatColumn(c));
        var prodTableColumns = prodTable.Columns
            .Cast<Column>()
            .Where(c => matchedColumns.Contains(c.Name))
            .Select(c => formatColumn(c));

with this

        var stagingTableColumns = doStuff(stagingTable, matchedColumns);
        var prodTableColumns = doStuff(prodTable, matchedColumns);

because I have to make sure everything in the 1st line is stagingXyz and the 2nd line is prodXyz. Not so bad for 1 line but the test script is huge and only ever does one of these 2 things:

  • foo(stagingXyz); foo(prodXyz);
  • bar(stagingXyz, prodXyz);

Similarly, wrapping with these items in an array and having doStuff[0]; doStuff[1]; is subject to the same easy typo error only a typo with 0 vs. 1 will be even harder to spot at a glance.

I thought about making 2 container objects (one for staging, one for prod) and putting these 2 objects in a collection but I fear this will lead to a bazillion tiny loops that will be very hard to maintain.

Is there anyway to simplify this and still have it be readable and maintainable?

A: 

Edit - After reading your comments I see the problem a bit clearer now. I think the problem is more the clarity of the one big function vs. coming up with a funky way to solve the readability problem. I think the more you broke it up into smaller functions, the clearer it would get.

If the main function was broken up into something like this:

public void mainMethod(DB prodDB, DB stagingDB)
{
    doPart1(prodDB, stagingDB);
    doPart2(prodDB, stagingDB);
}

...and each part had well named inputs like so:

public void doPart1(DB prodDB, DB stagingDB)
{
    // Code...    
}

Things would clear themselves up as you made things work at a more and more granular level. Anyone working in the doPart1 method only has to be concerned with it's small amount of code, and anyone working in the main section shouldn't have a million things to look over. I understand if this may sound like an oversimplified response, but it sounds like you're trying to solve a problem that shouldn't exist if the code is properly broken up.

If there is a method that is so huge and unreadable that another developer couldn't figure out what's going on with only TWO variables, then there is a different problem.

Ocelot20
I think the reason it's not structured this way already is because he needs to perform incremental operations on this PAIR of objects, not do everything to one then the other.
twon33
@Ocelot20: it sounds like your suggestion would run everything for obj1 then everything for obj2. This wouldn't be the same effect as in the original example; most notably inside the loops.
Dinah
Edited to better reflect the question asked.
Ocelot20
A: 

I know you claim that the details are unimportant, but in all reality that is what a program is... a large collection of many small details. The issue I can see arising here is a direct violation of LSP (Liskov Substitution Principle) . However, if that is not a concern and you can use an abstract class to represent the parent of these two children; you may proceed in defining an inheritance model that will be sufficient.

Woot4Moo
This is gibberish; there's no base class or derived class here. These are parallel instances of exactly the same types, not a base type and two derived types.
twon33
The question at the heart appeared to be an architecture question so I provided an architecture answer.
Woot4Moo
A: 

Could you generate your test script? The input might read something like

var %%AB%%Dbs = cleanupDbs(%%AB%%Server.Databases);
printDiff(%%A%%Dbs, %%B%%Dbs, "Databases mis-matched");
foreach (var db in %%A%%Dbs.Intersect(%%B%%Dbs)) {

    var %%AB%%Db = %%AB%%Server.Databases[db];
    var %%AB%%Tables = cleanupTables(%%AB%%Db.Tables);

    printDiff(%%A%%Tables, %%B%%Tables, "Tables mis-matched on " + db);

    ...
}

A line containing %%AB%% might expand to two copies of the same line, one with the "A" replacement and one with the "B" replacement, where %%A%% or %%B%% by itself might just get replaced.

twon33