tags:

views:

172

answers:

3

Hello.

Simple question for all you pragmatic object-oriented fellas. I have read many times to avoid classes like "Processor", and "xxxxHandler" in order to agree to OO standards: and I believe it's a good measure for understandability of the system code.

Let's assume we have a software that scans some file structure, let's say a bunch of specific CSV files. Let's say we have an independent module called CsvParser.

class CsvParser {
    public string GetToken(int position) { .. }
    public bool ReadLine() { .. }
}

class MyCsvFile {
    public string FullPath { get; }

    public void Scan() {
        CsvParser csvp(FullPath);
        while (csvp.ReadLine())
        {
            /* Parse the file that this class represents */
        }
    }
}

This will save having a "FileScanner" class, which is a -Processor- type class. Something that will collect say, a bunch of files from a directory, and scan each.

class MyFileScan {
    public string[] Files { get; set; }

    public void GetFiles() { this.Files = Directory.GetFiles(..); }

    public void ScanFiles() {
        foreach (string thisFilePath in Files)
        {
           CsvParser csvp(thisFilePath);
           /* ... */
        }
    }
}

The OO approach dictates having the MyCsvFile class, and then a method representing the operation on the object.

Any thoughts? What do you programmers think.

+3  A: 

I'd agree with your philospohy but if it was me I'd probably call the class CsvFile and have a Parse method in addition to the Scan one. In OO programming it's always desireable to make your classes represent "things" (nouns in English).

That aside if I was asked to maintain your code I'd grasp what a CsvParser class is likely to be doing whereas MyFileScan would send me into fits of rage and cause me to have to read the code to work it out.

sipwiz
Also, naming things "MyX" drives me crazy. Its fine if you're giving a very general explanatory example such as MyObject or MyClass (although in that case I would go with the metasyntactic variables: foo, bar, baz, qux, quux, corge, grault, garply, waldo, fred, plugh, xyzzy, thud). But if you're going to be using a class for anything, I want to see CSV not MyCSV. To whom does this CSV belong and why are they claiming ownership?
Imagist
Indeed correct and I totally agree. I'm using 'My' for example purposes here. I wanted to keep to my domain here without using foo, bar, etc. Good point.
lb
+1  A: 

This is Problem Domain vs. Solution Domain design.

In order to solve a problem, we can design our class to model real life objects, that is program according to Problem Domain.

Another way of programming is design according to Solution Domain.

For instance, when we are designing a Flight booking system, for Flight management expert, they will describe the flight trip as "route", "time", "angle" (I cann't really recall the term). If we design according to these model, it is called design according to Problem Domain.

We can also design using coordinate system (x, y, z), because we feel that as a programmer, we can deal with these more efficiently. This is design for Solution Domain.

The problem with Solution domain is, in the world of project, one thing which is constant is - CHANGE! the requirements will always change! If the requirements are change, you have to redesign you program.

However, If you model you classes as real life object, you are less affected by the changes, because real-life objects seldom change.

"Processor", and "xxxxHandler" <-- this is design to solution domain.

You could take a look at Domain-Driven Design --- DDD for shorts.

janetsmith
I see, I see.. will definitely look into it - "wikipediaing" as I speak. You see I think this has a lot to do with thought process.. how we model things. You're saying basically that how we use the objects is more likely to change than the object themselves? Or am I misreading.Thanks
lb
I am saying, the requirements from customer always change. If we design our classes based on real life model, it will immune from the change of requirement.
janetsmith
+1  A: 

I think what you're describing is that objects should take care of operations that only require themselves, which is in general a good rule to follow. There's nothing wrong with a "processor" class, as long as it "processes" a few different (but related) things. But if you have a class that only processes one thing (like a CSV parser only parses CSVs) then really there's no reason for the thing that the processor processes not to do the processing on itself.

However, there is a common reason for breaking this rule: usually you don't want to do things you don't have to do. For example, with your CSV class, if all you want is to find the row in the CSV where the first cell is "Bob" and get the third column in that row (which is, say, Bob's birth date) then you don't want to read in the entire file, parse it, and then search through the nice data structure you just created: it's inefficient, especially if your CSV has 100K lines and Bob's entry was on line 5.

You could redesign your CSV class to do small-scale operations on CSV's, like skipping to the next line and getting the first cell. But now you're implementing methods that you wouldn't really speak of a CSV having. CSV's don't read lines, they store them. They don't find cells, they just have them. Furthermore, if you want to do a large-scale operation such as reading in the entire CSV and sorting the lines by the first cell, you'll wish you had your old way of reading in the entire file, parsing it, and going over the whole data structure you created. You could do both in the same class, but now your class is really two classes for two different purposes. Your class has lost cohesion and any instance of the class you create is going to have twice as much baggage, while you're only likely to use half of it.

In this case, it makes sense to have a high-level abstraction of the CSV (for the large-scale operations) and a "processor" class for low-level operations. (The following is written in Java since I know that better than I know C#):

public class CSV
{
    final private String filename;
    private String[][] data;
    private boolean loaded;

    public CSV(String filename) { ... }

    public boolean isLoaded() { ... }
    public void load() { ... }
    public void saveChanges() { ... }
    public void insertRowAt(int rowIndex, String[] row) { ... }
    public void sortRowsByColumn(int columnIndex) { ... }

    ...
}

public class CSVReader
{
    /*
     * This kind of thing is reasonably implemented as a subclassable singleton
     * because it doesn't hold state but you might want to subclass it, perhaps with
     * a processor class for another tabular file format.
     */
    protected CSVReader();
    protected static class SingletonHolder
    {
        final public static CSVReader instance = new CSVReader();
    }

    public static CSVReader getInstance()
    {
        return SingletonHolder.instance;
    }

    public String getCell(String filename, int row, int column) { ... }
    public String searchRelative(String filename,
        String searchValue,
        int searchColumn,
        int returnColumn)
    { ... }

    ...
}

A similar well-known example of this is SAX and DOM. SAX is the low-level, fine-grained access, while DOM is the high-level abstraction.

Imagist
Had to pick this as it has opened my eyes a bit: I've bin wondering a bit on how I could abstract the CSVParser. At this point of time, it will only have read-only operations on the file.. but eventually yes requirements change. I'll take sipwiz's advice and let the object do what it needs. Thanks guys.
lb