tags:

views:

1373

answers:

4

I need to be able to validate a string against a list of the possible United States Postal Service state abbreviations, and Google is not offering me any direction.

I know of the obvious solution: and that is to code a horridly huge if (or switch) statement to check and compare against all 50 states, but I am asking StackOverflow, since there has to be an easier way of doing this. Is there any RegEx or an enumerator object out there that I could use to quickly do this the most efficient way possible?

[C# and .net 3.5 by the way]

List of USPS State Abbreviations

+1  A: 

Here's a regex. Enjoy!

^(?-i:A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$
Ben Hoffstein
Props on a cool regex but that is just plain ridiculous. I do not think regex is the way to go--this is extremely difficult to verify with the eye and any test you write to verify it works is likely to be worse than just implementing it a clearer way in the first place.
Michael Haren
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. -- Jamie Zawinski.
Craig Trader
Just a heads-up: this regex includes a number of abbreviations not commonly considered, like Puerto Rico, Northern Mariana Islands, Palau, and Marshall Islands.
hughdbrown
The regex is matching the USPS state abbreviation list referenced in the question. I match the same list in my answer.
Craig Trader
+13  A: 

I'd populate a hashtable with valid abbreviations and then check it with the input for validation. It's much cleaner and probably faster if you have more than one check per dictionary build.

Michael Haren
Voted up for clean and quick solution. Take some design time and make up for it at runtime!
Craig
Thanks for clarifying the specific generics, Jon.
Michael Haren
+5  A: 

A HashSet<string> is the cleanest way I can think of using the built-in types in .NET 3.5. (You could easily make it case-insensitive as well, or change it into a Dictionary<string, string> where the value is the full name. That would also be the most appropriate solution for .NET 2.0/3.0.)

As for speed - do you really believe this will be a bottleneck in your code? A HashSet is likely to perform "pretty well" (many millions of lookups a second). I'm sure alternatives would be even faster - but dirtier. I'd stick to the simplest thing that works until you have reason to believe it'll be a bottleneck.

(Edited to explicitly mention Dictionary<,>.)

Jon Skeet
+4  A: 

I like something like this:

private static String states = "|AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY|";

public static bool isStateAbbreviation (String state)
{
  return state.Length == 2 && states.IndexOf( state ) > 0;
}

This method has the advantage of using an optimized system routine that is probably using a single machine instruction to do the search. If I was dealing with non-fixed length words, then I'd check for "|" + state + "|" to ensure that I hadn't hit a substring instead of full match. That would take a wee bit longer, due to the string concatenation, but it would still match in a fixed amount of time. If you want to validate lowercase abbreviations as well as uppercase, then either check for state.UpperCase(), or double the 'states' string to include the lowercase variants.

I'll guarantee that this will beat the Regex or Hashtable lookups every time, no matter how many runs you make, and it will have the least memory usage.

Craig Trader
What happens if the user manages to enter "L|" as their input? I imagine it would validate under this code. It could easily be fixed with an IndexOf("|") line.
Matthew Ruston
Again, if you're worried about it, that's when you concatenate the delimiters around the search string. Thus you would be checking for "|L||", which would fail.
Craig Trader