ansaurus

Question

How do you deal with strings that have structure?

Answer 1

+2 A:

This is a pretty common problem falling under the title 'validation' - there are many ways to validate textual user input, one of the most common being Regular Expressions.

You might also consider using the built-in System.Net.MailAddress class for this, as it provides validation for email addresses.

Erik Forbes 2009-02-09 23:26:03

Answer 2

+1 A:

Strings are strings. If you need your strings to be smarter than average strings then parsing them into a structural object like you describe would be a good idea. I would use a regex to do that.

Aram Verstegen 2009-02-09 23:26:11

Answer 3

A:

Well, if you want to do several different kinds of things with an EmailAddress object, those other actions do not have to check if it is a valid email address since the EmailAddress object is guaranteed to have a valid string. You could throw an exception in the constructor or use a factory method or whatever "One True Methodology" approach you're using.

BobbyShaftoe 2009-02-09 23:28:05

I would recommend against throwing an exception on validation failure - Exceptions are for exceptional cases, and bad user input is definitely *not* exceptional. Also I typically shy away from throwing exceptions in constructors as well, but that's for my own reasons.

Erik Forbes 2009-02-09 23:29:39

Right. :) As always, it all depends on your particular brand of propaganda.

BobbyShaftoe 2009-02-09 23:49:17

*shrugs* Best practices, propaganda; whatever you want to call it. ;) Lol

Erik Forbes 2009-02-09 23:55:07

Answer 4

+9 A:

"Strings with structure" are a symptom of the common code smell "Primitive Obsession".

The remedy is to watch closely for duplication in code that validates or manipulates parts of these structures. At the first hint of duplication - but not before - extract a class that encapsulates the structure and locate validations and queries there.

Morendil 2009-02-09 23:30:54

+1 - very sage advice.

Erik Forbes 2009-02-09 23:32:38

ahhhhh not the "code smell" phrase ... <-- runs and hides

BobbyShaftoe 2009-02-09 23:50:48

Yes, you can almost always improve the structure of your code by wrapping the most common primitive-typed variables in new types.

Jay Bazuzi 2009-02-10 03:50:12

Answer 5

A:

Personally, I like the idea of strong typing, so if I were still working in such languages I'd go with the style of your second example. The only thing I'd change might be to use a more "cast-like" structure, like EmailAddressFromString(String), that generated a new EmailAddress object (or pitched a fit if the string wasn't right), as I'm a bit of a fan of application Hungarian notation.

This whole problem, incidentally, is covered pretty well by Joel in http://www.joelonsoftware.com/articles/Wrong.html if you're interested.

womble 2009-02-09 23:32:56

Answer 6

+1 A:

Regular expressions are your friend when it comes to formatting strings. you could also store each part separately in a struct to avoid going through the trouble of using regular expressions every time you want to use them. e.g.

struct EMail
{
    String BeforeAt = "johndoe123";
    String AfterAt = "gmail.com";
}

Struct URL
{
    String Protocol = "http";
    String Domain = "sub.example.com";
    String Path = "stuff/example.html";
}

2009-02-09 23:34:32

Answer 7

A:

I agree with the calls to strongly type the object, but for those cases where you're parsing from a string to an object, the answer is simple: error handling.

There are two general ways to handle errors: exceptions and return conditions. Generally if you expect to receive badly formed data, then you should return an error message. For cases where the input is not expected, then I would throw an exception. For example, you might pass in an ill formed email address, such as 'bob' instead of '[email protected]'. However, for null values, you might throw an exception, as you shouldn't try to form an email out of null.

Returning to your question, I do think you gain something by encoding a structure into an object. Specifically, you only need to validate that the string represents a valid email address in one specific place, such as the constructor. Elsewhere, your code is free to assume that an EmailAddress object is valid, and you don't have to rely upon dodgy classes with names like 'EmailHelper' or some such.

Travis 2009-02-10 03:47:46

Answer 8

A:

I personally do not think strong-typing the email address string as EmailAddress is necessary, in this case.

To create your email address you will, sooner or later, have to do something like:

EmailAddress(String email)

or a setter

SetEmailAddress(String email)

In both cases, you'll have to validate the email string input, which puts you back into your initial validation problem.

I would, as others pointed out, use regular expressions.

Having an EmailAddress class would be useful if you plan on having to perform specific operations on your stored information later on (say get domain name only, stuff like that).

turbovince 2009-02-10 03:55:37

Answer 9

+1 A:

Welcome to the world of programming!

I don't think your question is a symptom of an error on your part. Rather it is a basic problem which appears in many guises throughout the programming world. Strings that have some structure and meaning are passed around between different subsystems of an application and each subsystem can only do much parsing and validation.

The problem of verifying an email address, for example, is quite tricky. The regular expressions various people offer accepting an email address, for example, are generally either "too tight" (don't accept everything) or "too loose" (accept illegal things). The first google hit for 'regex "email address"', for example says:

The regular expression I receive the most feedback, not to mention "bug" reports on, is the one you'll find right on this site's home page: \b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}\b Analyze this regular expression with RegexBuddy. This regular expression, I claim, matches any email address. Most of the feedback I get refutes that claim by showing one email address that this regex doesn't match.

The fact is the what is or isn't a valid email address is a complex problem, one that a given program might or might not want to solve. The problem of URLs is even worse, especially given the possibility of malicious URLS.

Ideally, you can have a library or system-call which solves problems of this sort instead of doing anything yourself (Microsoft windows calls a custom dialogue box to allow the user to select or create a file, since validating file names is another tricky problem). But you can't always count on having an appropriate system call for a given "meaningful string" either.

I would say that there no a generic solution to the problem of strings-with-structure. Rather, it is a basic problem that appears right when you design your application. In the process of gathering requirements for your application, you should determine what data the application will take in and how meaningful that data will be to the application. And this is where things get tricky, since you may notice the possibility that the app may grow in ways that your boss or customer might not have thought of - or the app may in fact grow in ways that none of you thought of. Thus the application needs to be a little more flexible than what seems like the minimum BUT only a little. It should also not be so flexible you get bogged down.

Now, if you decide that you need to validate/interpret etc a given string, putting that string into an object or a hash can be a good approach - this is one way I know to make sure your interface is clear. But the tricky thing is deciding just how much validation or interpretation you need.

Making these decisions is thus an art - there are no dogmatic answers that work here.

Joe Soul-bringer 2009-02-10 18:54:49

I originally accepted Morendil's answer, but after some thought and rereading, I've decided to accept yours. It's not that I disagree with Morendil, but your answer is more general, less dogmatic, and more in line with the type of discussion I was hoping to stimulate. Thank you!

Metaphile 2009-02-12 16:52:15

ansaurus

tags:

views:

answers:

How do you deal with strings that have structure?

related questions