ansaurus

Question

Validate String against USPS State Abbreviations

Answer 1

+1 A:

Here's a regex. Enjoy!

^(?-i:A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$

Ben Hoffstein 2008-10-06 20:45:50

Props on a cool regex but that is just plain ridiculous. I do not think regex is the way to go--this is extremely difficult to verify with the eye and any test you write to verify it works is likely to be worse than just implementing it a clearer way in the first place.

Michael Haren 2008-10-06 20:47:46

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. -- Jamie Zawinski.

Craig Trader 2008-10-06 21:52:09

Just a heads-up: this regex includes a number of abbreviations not commonly considered, like Puerto Rico, Northern Mariana Islands, Palau, and Marshall Islands.

hughdbrown 2008-10-06 22:00:36

The regex is matching the USPS state abbreviation list referenced in the question. I match the same list in my answer.

Craig Trader 2008-10-06 22:09:56

Answer 2

+13 A:

I'd populate a hashtable with valid abbreviations and then check it with the input for validation. It's much cleaner and probably faster if you have more than one check per dictionary build.

Michael Haren 2008-10-06 20:45:54

Voted up for clean and quick solution. Take some design time and make up for it at runtime!

Craig 2008-10-06 20:48:59

Thanks for clarifying the specific generics, Jon.

Michael Haren 2008-10-06 20:58:48

Answer 3

+5 A:

A HashSet<string> is the cleanest way I can think of using the built-in types in .NET 3.5. (You could easily make it case-insensitive as well, or change it into a Dictionary<string, string> where the value is the full name. That would also be the most appropriate solution for .NET 2.0/3.0.)

As for speed - do you really believe this will be a bottleneck in your code? A HashSet is likely to perform "pretty well" (many millions of lookups a second). I'm sure alternatives would be even faster - but dirtier. I'd stick to the simplest thing that works until you have reason to believe it'll be a bottleneck.

(Edited to explicitly mention Dictionary<,>.)

Jon Skeet 2008-10-06 20:53:45

Answer 4

+4 A:

I like something like this:

private static String states = "|AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY|";

public static bool isStateAbbreviation (String state)
{
  return state.Length == 2 && states.IndexOf( state ) > 0;
}

This method has the advantage of using an optimized system routine that is probably using a single machine instruction to do the search. If I was dealing with non-fixed length words, then I'd check for "|" + state + "|" to ensure that I hadn't hit a substring instead of full match. That would take a wee bit longer, due to the string concatenation, but it would still match in a fixed amount of time. If you want to validate lowercase abbreviations as well as uppercase, then either check for state.UpperCase(), or double the 'states' string to include the lowercase variants.

I'll guarantee that this will beat the Regex or Hashtable lookups every time, no matter how many runs you make, and it will have the least memory usage.

Craig Trader 2008-10-06 21:34:12

What happens if the user manages to enter "L|" as their input? I imagine it would validate under this code. It could easily be fixed with an IndexOf("|") line.

Matthew Ruston 2008-10-06 22:49:22

Again, if you're worried about it, that's when you concatenate the delimiters around the search string. Thus you would be checking for "|L||", which would fail.

Craig Trader 2008-10-07 15:16:28

ansaurus

tags:

views:

answers:

Validate String against USPS State Abbreviations

related questions