views:

152

answers:

4

I want to design a class that will parse a string into tokens that are meaningful to my application.

How do I design it?

  1. Provide a ctor that accepts a string, provide a Parse method and provide methods (let's call them "minor") that return individual tokens, count of tokens etc. OR
  2. Provide a ctor that accepts nothing, provide a Parse method that accepts a string and minor methods as above. OR
  3. Provide a ctor that accepts a string and provide only minor methods but no parse method. The parsing is done by the ctor.

1 and 2 have the disadvantage that the user may call minor methods without calling the Parse method. I'll have to check in every minor method that the Parse method was called.

The problem I see in 3 is that the parse method may potentially do a lot of things. It just doesn't seem right to put it in the ctor.

2 is convenient in that the user may parse any number of strings without instantiating the class again and again.

What's a good approach? What are some of the considerations?

(the language is c#, if someone cares).

Thanks

+4  A: 

I would have a separate class with a Parse method that takes a string and converts it into a separate new object with a property for each value from the string.

ValueObject values = parsingClass.Parse(theString);
Garry Shutler
It would be nice to see also your reasoning. Could you edit that to your answer, please.
hakan
+1  A: 

I think this is a really good question...

In general, I'd go with something that resembles option 3 above. Basically, think about your class and what it does; does it have any effective data other than the data to parse and the parsed tokens? If not, then I would generally say that if you don't have those things, then you don't really have an instance of your class; you have an incomplete instance of your class; something which you'd like to avoid.

One of the considerations that you point out is that the parsing of the tokens may be a relatively computationally complicated process; it may take a while. I agree with you that you may not want to take the hit for doing that in the constructor; in that case, it may make sense to use a Parse() method. The question that comes in, though, is whether or not there's any sensible operations that can be done on your class before the parse() method completes. If not, then you're back to the original point; before the parse() method is complete, you're effectively in an "incomplete instance" state of your class; that is, it's effectively useless. Of course, this all changes if you're willing and able to use some multithreading in your application; if you're willing to offload the computationally complicated operations onto another thread, and maintain some sort of synchronization on your class methods / accessors until you're done, then the whole parse() thing makes more sense, as you can choose to spawn that in a new thread entirely. You still run into issues of attempting to use your class before it's completely parsed everything, though.

I think an even more broad question that comes into this design, though, is what is the larger scope in which this code will be used? What is this code going to be used for, and by that, I mean, not just now, with the intended use, but is there a possibility that this code may need to grow or change as your application does? In terms of the stability of implementation, can you expect for this to be completely stable, or is it likely that something about the set of data you'll want to parse or the size of the data to parse or the tokens into which you will parse will change in the future? If the implementation has a possibility of changing, consider all the ways in which it may change; in my experience, those considerations can strongly lead to one or another implementation. And considering those things is not trivial; not by a long shot.

Lest you think this is just nitpicking, I would say, at a conservative estimate, about 10 - 15 percent of the classes that I've written have needed some level of refactoring even before the project was complete; rarely has a design that I've worked on survived implementation to come out the other side looking the same way that it did before. So considering the possible permutations of the implementation becomes very useful for determining what your implementation should be. If, say, your implementation will never possibly want to vary the size of the string to tokenize, you can make an assumption about the computatinal complexity, that may lead you one way or another on the overall design.

McWafflestix
While not exactly answering the question I really enjoyed your breakdown into the thought process one should be using when considering designs.
Rob Booth
A: 

If the sole purpose of the class is to parse the input string into a group of properties, then I don't see any real downside in option 3. The parse operation may be expensive, but you have to do it at some point if you're going to use it.

You mention that option 2 is convenient because you can parse new values without reinstantiating the object, but if the parse operation is that expensive, I don't think that makes much difference. Compare the following code:

// Using option 3
ParsingClass myClass = new ParsingClass(inputString);

// Parse a new string.
myClass = new ParsingClass(anotherInputString);

// Using option 2
ParsingClass myClass = new ParsingClass();
myClass.Parse(inputString);

// Parse a new string.
myClass.Parse(anotherInputString);

There's not much difference in use, but with Option 2, you have to have all your minor methods and properties check to see if parsing had occurred before they can proceed. (Option 1 requires to you do everything that option 2 does internally, but also allows you to write Option 3-style code when using it.)

Alternatively, you could make the constructor private and the Parse method static, having the Parse method return an instance of the object.

// Option 4
ParsingClass myClass = ParsingClass.Parse(inputString);

// Parse a new string.
myClass = ParsingClass.Parse(anotherInputString);

Options 1 and 2 provide more flexibility, but require more code to implement. Options 3 and 4 are less flexible, but there's also less code to write. Basically, there is no one right answer to the question. It's really a matter of what fits with your existing code best.

Jeromy Irvine
A: 

Two important considerations:

1) Can the parsing fail?

If so, and if you put it in the constructor, then it has to throw an exception. The Parse method could return a value indicating success. So check how your colleagues feel about throwing exceptions in situations which aren't show-stopping: default is to assume they won't like it.

2) The constructor must get your object into a valid state.

If you don't mind "hasn't parsed anything yet" being a valid state of your objects, then the parse method is probably the way to go, and call the class SomethingParser.

If you don't want that, then parse in the constructor (or factory, as Garry suggests), and call the class ParsedSomething.

The difference is probably whether you are planning to pass these things as parameters into other methods. If so, then having a "not ready yet" state is a pain, because you either have to check for it in every callee and handle it gracefully, or else you have to write documentation like "the parameter must already have parsed a string". And then most likely check in every callee with an assert anyway.

You might be able to work it so that the initial state is the same as the state after parsing an empty string (or some other base value), thus avoiding the "not ready yet" problem.

Anyway, if these things are likely to be parameters, personally I'd say that they have to be "ready to go" as soon as they're constructed. If they're just going to be used locally, then you might give users a bit more flexibility if they can create them without doing the heavy lifting. The cost is requiring two lines of code instead of one, which makes your class slightly harder to use.

You could consider giving the thing two constructors and a Parse method: the string constructor is equivalent to calling the no-arg constructor, then calling Parse.

Steve Jessop