views:

268

answers:

9

Suppose I have an object representing a person, with getter and setter methods for the person's email address. The setter method definition might look something like this:

setEmailAddress(String emailAddress)
    {
    this.emailAddress = emailAddress;
    }

Calling person.setEmailAddress(0), then, would generate a type error, but calling person.setEmailAddress("asdf") would not - even though "asdf" is in no way a valid email address.

In my experience, so-called strings are almost never arbitrary sequences of characters, with no restriction on length or format. URIs come to mind - as do street addresses, as do phone numbers, as do first names ... you get the idea. Yet these data types are most often stored as "just strings".

Returning to my person object, suppose I modify setEmailAddress() like so

setEmailAddress(EmailAddress emailAddress)
    // ...

where EmailAddress is a class ... whose constructor takes a string representation of an email address. Have I gained anything?

OK, so an email address is kind of a bad example. What about a URI class that takes a string representation of a URI as a constructor parameter, and provides methods for managing that URI - setting the path, fetching a query parameter, etc. The validity of the source string becomes important.

So I ask all of you, how do you deal with strings that have structure? And how do you make your structural expectations clear in your interfaces?

Thank you.

+2  A: 

This is a pretty common problem falling under the title 'validation' - there are many ways to validate textual user input, one of the most common being Regular Expressions.

You might also consider using the built-in System.Net.MailAddress class for this, as it provides validation for email addresses.

Erik Forbes
+1  A: 

Strings are strings. If you need your strings to be smarter than average strings then parsing them into a structural object like you describe would be a good idea. I would use a regex to do that.

Aram Verstegen
A: 

Well, if you want to do several different kinds of things with an EmailAddress object, those other actions do not have to check if it is a valid email address since the EmailAddress object is guaranteed to have a valid string. You could throw an exception in the constructor or use a factory method or whatever "One True Methodology" approach you're using.

BobbyShaftoe
I would recommend against throwing an exception on validation failure - Exceptions are for exceptional cases, and bad user input is definitely *not* exceptional. Also I typically shy away from throwing exceptions in constructors as well, but that's for my own reasons.
Erik Forbes
Right. :) As always, it all depends on your particular brand of propaganda.
BobbyShaftoe
*shrugs* Best practices, propaganda; whatever you want to call it. ;) Lol
Erik Forbes
+9  A: 

"Strings with structure" are a symptom of the common code smell "Primitive Obsession".

The remedy is to watch closely for duplication in code that validates or manipulates parts of these structures. At the first hint of duplication - but not before - extract a class that encapsulates the structure and locate validations and queries there.

Morendil
+1 - very sage advice.
Erik Forbes
ahhhhh not the "code smell" phrase ... <-- runs and hides
BobbyShaftoe
Yes, you can almost always improve the structure of your code by wrapping the most common primitive-typed variables in new types.
Jay Bazuzi
A: 

Personally, I like the idea of strong typing, so if I were still working in such languages I'd go with the style of your second example. The only thing I'd change might be to use a more "cast-like" structure, like EmailAddressFromString(String), that generated a new EmailAddress object (or pitched a fit if the string wasn't right), as I'm a bit of a fan of application Hungarian notation.

This whole problem, incidentally, is covered pretty well by Joel in http://www.joelonsoftware.com/articles/Wrong.html if you're interested.

womble
+1  A: 

Regular expressions are your friend when it comes to formatting strings. you could also store each part separately in a struct to avoid going through the trouble of using regular expressions every time you want to use them. e.g.

struct EMail
{
    String BeforeAt = "johndoe123";
    String AfterAt = "gmail.com";
}

Struct URL
{
    String Protocol = "http";
    String Domain = "sub.example.com";
    String Path = "stuff/example.html";
}
A: 

I agree with the calls to strongly type the object, but for those cases where you're parsing from a string to an object, the answer is simple: error handling.

There are two general ways to handle errors: exceptions and return conditions. Generally if you expect to receive badly formed data, then you should return an error message. For cases where the input is not expected, then I would throw an exception. For example, you might pass in an ill formed email address, such as 'bob' instead of '[email protected]'. However, for null values, you might throw an exception, as you shouldn't try to form an email out of null.

Returning to your question, I do think you gain something by encoding a structure into an object. Specifically, you only need to validate that the string represents a valid email address in one specific place, such as the constructor. Elsewhere, your code is free to assume that an EmailAddress object is valid, and you don't have to rely upon dodgy classes with names like 'EmailHelper' or some such.

Travis
A: 

I personally do not think strong-typing the email address string as EmailAddress is necessary, in this case.

To create your email address you will, sooner or later, have to do something like:

EmailAddress(String email)

or a setter

SetEmailAddress(String email)

In both cases, you'll have to validate the email string input, which puts you back into your initial validation problem.

I would, as others pointed out, use regular expressions.

Having an EmailAddress class would be useful if you plan on having to perform specific operations on your stored information later on (say get domain name only, stuff like that).

turbovince
+1  A: 

Welcome to the world of programming!

I don't think your question is a symptom of an error on your part. Rather it is a basic problem which appears in many guises throughout the programming world. Strings that have some structure and meaning are passed around between different subsystems of an application and each subsystem can only do much parsing and validation.

The problem of verifying an email address, for example, is quite tricky. The regular expressions various people offer accepting an email address, for example, are generally either "too tight" (don't accept everything) or "too loose" (accept illegal things). The first google hit for 'regex "email address"', for example says:

The regular expression I receive the most feedback, not to mention "bug" reports on, is the one you'll find right on this site's home page: \b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}\b Analyze this regular expression with RegexBuddy. This regular expression, I claim, matches any email address. Most of the feedback I get refutes that claim by showing one email address that this regex doesn't match.

The fact is the what is or isn't a valid email address is a complex problem, one that a given program might or might not want to solve. The problem of URLs is even worse, especially given the possibility of malicious URLS.

Ideally, you can have a library or system-call which solves problems of this sort instead of doing anything yourself (Microsoft windows calls a custom dialogue box to allow the user to select or create a file, since validating file names is another tricky problem). But you can't always count on having an appropriate system call for a given "meaningful string" either.

I would say that there no a generic solution to the problem of strings-with-structure. Rather, it is a basic problem that appears right when you design your application. In the process of gathering requirements for your application, you should determine what data the application will take in and how meaningful that data will be to the application. And this is where things get tricky, since you may notice the possibility that the app may grow in ways that your boss or customer might not have thought of - or the app may in fact grow in ways that none of you thought of. Thus the application needs to be a little more flexible than what seems like the minimum BUT only a little. It should also not be so flexible you get bogged down.

Now, if you decide that you need to validate/interpret etc a given string, putting that string into an object or a hash can be a good approach - this is one way I know to make sure your interface is clear. But the tricky thing is deciding just how much validation or interpretation you need.

Making these decisions is thus an art - there are no dogmatic answers that work here.

Joe Soul-bringer
I originally accepted Morendil's answer, but after some thought and rereading, I've decided to accept yours. It's not that I disagree with Morendil, but your answer is more general, less dogmatic, and more in line with the type of discussion I was hoping to stimulate. Thank you!
Metaphile