tags:

views:

591

answers:

9

I have a class which you pass in a folder and then it goes off and processes a lot of data within the specified folder.

For instance:

MyClass myClass = new MyClass(@"C:\temp");

Now what it does it goes off and reads say a couple thousand files and populates the class with data.

Should I move this data out from the constructor and have it as a seperate method, such as:

MyClass myClass = new MyClass();
myClass.LoadFromDirectory(@"C:\temp");
+1  A: 

Is this all your class does? If so, then I would say it doesn't really matter. But it is likely that you're class is actually doing more than what you have shown. Does it have any error handling, for example?

The purpose of the constructor is to construct an object. The purpose of a method is to perform an action. So my vote is for this form:

MyClass myClass = new MyClass();
myClass.LoadFromDirectory(@"C:\temp");
Robert Harvey
There will be a lot more to the class and I am in the process of designing it. The guts of the class though require that it is populated via the data passed into the class.
Coding Monkey
+4  A: 

It depends. You should evaluate the basic purpose of the class. What function does it perform?

What I usually prefer is to have a class constructor do the initialization necessary for the functioning of the class. Then I call methods on the class which can safely assume that the necessary initialization has been done.

Typically, the initalization phase should not be too intensive. An alternative way of doing the above may be:

// Instantiate the class and get ready to load data from files.
MyClass myClass = new MyClass(@"C:\temp");

// Parse the file collection and load necessary data.
myClass.PopulateData();
Cerebrus
This is actually a best practice and most refactoring tools will tell you to not call methods from your constructors unless they are initialization methods, such as creating controls, or setting properties.
Tom Anderson
That leads to the class having two very distinct states though - one where it has a filename, and one where it has the real data. After population, it sounds like it won't need the filename at all, but it'll still need a variable for it. That kind of thing always makes me nervous. I prefer the static method approach, where the result is an object which is "ready for action".
Jon Skeet
@Jon Skeet, awesome, not the best place, but your C# in Depth book has helped me a lot. Thanks! -My concern is that this class can take several minutes to create as it processes the data and having a dual state is what I was hoping to avoid.
Coding Monkey
Very true Jon, the stated object creations are a pain to deal with, but another very common way to implement this type of functionality, like the SqlConnection class, you still have to "Open" the connection, even though you passed the ConnectionString along the constructor.
Tom Anderson
Then again, that is a purposefully stated object with an exposed State
Tom Anderson
Adding a state property would be possible.
Coding Monkey
+18  A: 

Maybe you should try it this way with a static method that returns an instance of the object.

var myClass = MyClass.LoadFromDirectory(@"C:\temp");

This will keep the initialization code outside of your constructor, as well as giving you that "one line" declaration you are looking for.


Going on the comment from below from the poster, by adding State an implementation could be like so:

public class MyClass
{

#region Constructors 

    public MyClass(string directory)
    {
     this.Directory = directory;
    }

#endregion

#region Properties

    public MyClassState State {get;private set;}

    private string _directory;

    public string Directory 
    {
     get { return _directory;} 
     private set 
     {
      _directory = value; 
      if (string.IsNullOrEmpty(value)) 
       this.State = MyClassState.Unknown; 
      else 
       this.State = MyClassState.Initialized;
     }
    }

#endregion



    public void LoadFromDirectory()
    {
     if (this.State != MyClassState.Initialized || this.State != MyClassState.Loaded)
      throw new InvalidStateException();

     // Do loading

     this.State = MyClassState.Loaded;
    }

}

public class InvalidStateException : Exception {}


public enum MyClassState
{
    Unknown,
    Initialized,
    Loaded
}
Tom Anderson
Good idea, initialization and use of a class are often quite different. This nicely separates them. For even more separation you can move the initialization logic into a factory or builder class.
Mendelt
A: 

I think you should decide between the two approaches above ("first initialize, then execute" vs "empty init, perform with params") based on whether you plan on reusing the same object to perform the same operation on a differnt input.
if the class is only used to run the task on a fixed param, I'd go with initializing it in the constructor (making it readonly even), and then performing the task on a different method.
If you want to keep performing the the task on different params, I'd put it in the task method itself.

If all the class does is this task, I'd also consider changing it all to a static class/methods - it doesn't need to keep its internal state.

Anyway, I would never put the task itself in the constructor. As Cerebrus said, the initialization should be fast.

Noam Gal
A: 

Unless the main purpose of your class is to perform I/O, you should probably not be performing I/O (potentially throwing an IOException) in the constructor.

Consider splitting the class in two:

interface IMyDataSource
{
   // ...
}

class FileDataSource: IMyDataSource
{
    public FileDataSource(string path)
    {
        // ...
    }
}

class MyClass
{
    public MyClass(IMyDataSource source)
    {
         // ...
    }
}

IMyDataSource myDS = new FileDataSource(@"C:\temp");
MyClass myClass = new MyClass(myDS);

That way the main class can focus on managing its own state, while the data source builds the initial state from the file contents.

finnw
But now the FileDataSource has the potential of throwing exceptions in it's constructor, thus just passing the errors to a separate object.
Tom Anderson
@Tom Anderson, that's the idea. FileDataSource is designed to do I/O so I/O exceptions in the constructor are expected. MyClass is not designed to do I/O, so I/O exceptions in the constructor are avoided.
finnw
A: 

If that is the only resource the class works with, it'd probably be better to pass the path to the constructor. Otherwise, it would be a parameter to your class members.

Sergey
A: 

My personal preference would be to use C# 3.0 initializers.

class MyClass {
    public string directory;
    public void Load() {
      // Load files from the current directory
      }
  }

MyClass myClass = new MyClass{ directory = @"C:\temp" };
myClass.Load();

This has a few advantages:

  • Object instantiation won't have an automatic file system side effect.
  • All arguments are named.
  • All arguments are optional (but, of course, could throw an exception in Load() if not defined)
  • You can initialize as many properties as you want in the instantiation call without having to overload the constructor. For instance, options for whether to recurse directories, or a wildcard for filespec to search for.
  • You could still have some logic in the setter for directory to do some stuff, but again, side effects are usually not a good thing.
  • By performing file operations in a separate procedure call, you avoid the issue of not being able to reference your myClass instance in the exception handler.
richardtallent
While still not a bad approach, it does though have a state, field set or not set.
Tom Anderson
A: 

I'm going to echo the "split them up" folks here. If it helps, try this:

  1. Ask yourself, "What does this method/property/field do?"
  2. Make it do that; no more, no less.

Applying that here, you get this:

  1. The constructor is supposed to create the object.
  2. Your method is supposed to load its data from the filesystem.

That seems to me to be a lot more logical than "The constructor is supposed to create the object and load its data from the filesystem.

Ari Roth
Agreed. If you want to block and only come back when your object back has been populated, use a factory with a static method.
richardtallent
+1  A: 

I agree with Ari and others - split them up.

A constructor should really do the minimum amount of work (simply initialise the object ready for use and leave it at that). By using a separate method to do the work:

  • It is clearer to the caller that the worker function may take a long time.
  • It is easy to provide several constructors to initialise the object with different information (e.g. you might be able to pass in your own class (rather than a string) that can supply the pathname. Or you could pass in an extra parameter that specifies a wildcarded filename to match, or a flag to specify if the search should recurse into subfolders).
  • You avoid any issues with the constructor. In the constructor the object is not fully formed, so it can be dangerous to do work - e.g. calling a virtual function inside a constructor is a very bad idea. The less code you put in the constructor the less likely it is that you'll do something "bad" by accident.
  • It's cleaner coding style to separate different behaviours/functions into separate methods. Keep initialisation and work separated
  • A split class will be easier to maintain and refactor in future.
Jason Williams