tags:

views:

42

answers:

2

I have a list of 350 addresses in a single column excel file that I need to import to a SQL table, breaking the data into columns.

content of the Excel cells is such as this one

Courtesy Motors 2520 Cohasset Rd - Chico, CA 95973-1307 530-893-1300  

What strategy should I apply to import this in a clean fashion?

I was thinking

NAME <- anything before the 1st digit

STREET ADDRESS <- from the 1st digit to the '-'

STATE <- Anything from the last ',' to the '-' immediately before (the address field can contain some - )

TELEPHONE <- Last 12 char

ZIP <- 10 first char of the last 22 char

I work in C# if this matters.

Is RegEx the appropriate approach? I'm not too familiar with them, so I'm not sure. Can somebody suggest a RegEx expression that would do the job (or part of it)?

Thanks!

+1  A: 

The following regex should pull out each part in a capture group:

(\D+) ([^-]+) - ([^,]+, \w+) ([\d-]+) ([\d-]+)

Capture groups, in order:

  1. Name
  2. Street address
  3. City, State
  4. Zip
  5. Phone
Amber
The OP didn't specify City, just State.
Jason McCreary
Right, Jason; but it should be fairly straightforward to adapt the capture to only grab the state instead. I figured I'd provide a more general regex that could be adapted.
Amber
+1  A: 

A regular expression is the tool for this job. I am not a C# developer, so I can't give you the exact code. Nonetheless, the following regex should work. Most IDEs have this built in or if you have access to UNIX sed would work.

([^\d]+)\s(.+?)\s-\s[^,]+,\s([A-Z]{2})\s([^\s]+)\s([^\s]+)

Captures:

  1. Name
  2. Address
  3. State
  4. ZIP
  5. Phone
Jason McCreary
It's much simpler to write `[^\d]` as just `\D`.
Amber
@Amber, some regex implementations don't support negation groups.
Jason McCreary
@Jason Most do however.
Amber
@Amber, fair. There's 100 ways to write regexes, that's the beauty and the curse.
Jason McCreary