views:

549

answers:

3

The app I am writing deals with utility service addresses, and right now I am forcing the user to know enough to separate the parts of the address and put them in the appropriate fields before adding to the database. It has to be done this way for sorting purposes because a straight alphabetical sort isn't always right when there is a pre-direction in the address. For example, right now if the user wanted to put in the service address 123 N Main St, they would enter it as:

  • Street Number = 123
  • Pre-direction = N
  • Street Name = Main
  • Street Type = St

I've tried to separate this address into its parts by using the Split function and iterating through each part. What I have so far is below:

Public Shared Function ParseServiceAddress(ByVal Address As String) As String()
        'this assumes a valid address - 101 N Main St South
        Dim strResult(5) As String  '0=st_num, 1=predir, 2=st_name, 3=st_type, 4=postdir
        Dim strParts() As String
        Dim strSep() As Char = {Char.Parse(" ")}
        Dim i As Integer
        Dim j As Integer = 0
        Address = Address.Trim()
        strParts = Address.Split(strSep)  'split using spaces
        For i = 0 To strParts.GetUpperBound(0)
            If Integer.TryParse(strParts(i), j) Then
                'this is a number, is it the house number?
                If i = 0 Then
                    'we know this is the house number
                    strResult(0) = strParts(i)
                Else
                    'part of the street name
                    strResult(2) = strResult(2) & " " & strParts(i)
                End If
            Else
                Select Case strParts(i).ToUpper()
                    Case "TH", "ND"
                        'know this is part of the street name
                        strResult(2) = strResult(2) & strParts(i)
                    Case "NORTH", "SOUTH", "EAST", "WEST", "N", "S", "E", "W"
                        'is this a predirection?
                        If i = 1 Then
                            strResult(1) = strParts(i)
                        ElseIf i = strParts.GetUpperBound(0) Then
                            'this is the post direction
                            strResult(4) = strParts(i)
                        Else
                            'part of the name
                            strResult(2) = strResult(2) & strParts(i)
                        End If
                    Case Else
                        If i = strParts.GetUpperBound(0) Then
                            'street type
                            strResult(3) = strParts(i)
                        Else
                            'part of the street name
                            strResult(2) = strResult(2) & " " & strResult(i)
                        End If
                End Select
            End If
        Next i
        Return strResult
    End Function
I've found this method to be cumbersome, slow, and even totally wrong when given a wonky address. I'm wondering if what I'm trying to do here would be a good application for a regular expression? Admittedly I've never used regex in anything before and am a total newbie in that regard.

Thank you in advance for any help. :)

Edit - Seems more and more like I'm going to need a parser and not just regex. Does anyone know of any good address parser libraries in .NET? Writing our own is just not in the cards right now, and would be sent to the back burner if it came to that.

A: 

You could do this in PERL using Geo::StreetAddress::US

http://search.cpan.org/~sderle/Geo-StreetAddress-US-0.99/US.pm

For example:

  my $hashref = Geo::StreetAddress::US->parse_address(
                "1600 Pennsylvania Ave, Washington, DC" );

NoahD
Too bad this is in VB.NET, because that is almost exactly what I am looking for. You don't happen to know if any parser libraries in .NET?
Heather
Actually, this might be the better thread:http://stackoverflow.com/questions/16413/parse-usable-street-address-city-state-zip-from-a-string
NoahD
+1  A: 

I don't have a set of addresses to (easily) test against, but here is something to try at least. It may be too permissive in places or too restrictive in others, but you should be able to tweak it. You'll definitely need to tweak the list of predirections, but you will have to specify those explicitly. Also, be sure to set your regex options to be case-insensitive.

^(?<StreetNumber>[0-9]+)\s*(?<Predirection>(n)|(s)|(e)|(w)|(north)|(south)|(east)|(west))?\s+(?<StreetName>[a-z0-9 -'.]+)\s+(?<StreetType>[a-z.]+)$

In reality though, it would probably be better to delegate this to an address parser if possible, like the one NoahD suggested. You'll have to do some digging to find something for .NET probably, but if you can't find anything, then I would go with a regular expression for sure.

edit: do'h, \s, not /s

edit: changed regex for more semantic grouping. You can access the group values like so:

string address = "123 n main st";
Regex regex = new Regex("insert the regex above here", RegexOptions.IgnoreCase); 
MatchCollection matches = regex.Matches(address);

foreach(Match match in matches)
{
    string streetAddress = matches.Groups["StreetAddress"];
    string predirection = matches.Groups["Predirection"];
    string streetName = matches.Groups["StreetName"];
    string streetType = matches.Groups["StreetType"];
}
Stuart Branham
Hmmm... I think I didn't quite understand previously what regex did. As you said, an address parser is probably what I need. Plugging this expression into .NET's Regex object worked very well to validate my input, so +1 on that account. Thank you for your help. :)
Heather
Actually, you can use regex to extract parts of a string. I sort of wrote this one sloppily, so it may be harder to know which groups to pull. Just do a google search for "C# Regex Groups" or something.
Stuart Branham
As for an address parser, I think geocoder.us has one. I don't know if you have to pay for it or not, though.
Stuart Branham
This actually helps me a lot, but doesn't totally solve the problem. I think resources are going to be allocated into better endeavors for the time being. Thank you again for taking the time to help me out with this.
Heather
A: 

Would using Geocoding from Google be appropriate for your app?

http://code.google.com/apis/maps/documentation/services.html#Geocoding_Structured

NoahD