views:

172

answers:

8

I'm putting together a database of locations for looking up nearest locations for a given address. As I started laying out the table going about my business I wondered what other people were doing and if there was a 'best practices' for some common datatypes. Such as phone numbers, addresses and even latitude and longitude.

This is what I have so far. Any suggestions or improvements?

These should all be US addresses, FYI.

 Column          DataType
 ------        ------------
 id            int
 city          nvarchar(100) 
 address       nvarchar(100) 
 address2      nvarchar(100)
 state         varchar(2)
 zip           nvarchar(10)
 phone         nvarchar(14)
 fax           nvarchar(14) 
 name          nvarchar(100)
 latitude      float
 longitude     float
 notes         text
+1  A: 

Considering you only want US addresses, I'd go with varchar for fax, phone and zip. You could even probably use varchar for city, address and AddressExtra as well.

If you are going to print address you might need some way to break it up by line, but you only have one column, which could be a little short.

I'd normalize notes into its own table, so you can track them by date or type.

you might want a status, "N"ew, "D"uplicate, "X"deleted, "A"ctive, etc.

KM
My original thought was to use varchar for everything. But knowing how things change around here I thought I should play it safe and use nvarchar just in case Canada was added or some other country. Sort of 'planning for the worst'. The status is a great idea though! I'll add that now.
Mikecancook
Please... this is 21st century. People and geographical locations have all kinds of weird names with letters outside of US-ASCII - yes, even in the US. Save yourself, your users, and the person who will maintain that DB in the future, the trouble, and just use Unicode everywhere for all textual input that comes from the outside world.
Pavel Minaev
I'd also point out that if you are going to be using SSIS anytime, it is far easier if all string data is in nvarchar because it wants to use that and will not do implicit conversions.
HLGEM
Handling notes in a normalized fashion as KM suggests will be much easier long-term and allow you to store meta-data like change time.
Rob Garrison
+3  A: 

I'd use varchar(max), or nvarchar(max) instead of text...I think TEXT is the "old way"

EJB
+2  A: 

Your phone number should support more character. With today VOIP phone network, some business that are in transition from a normal phone line to a VOIP phone line sometime give the new phone extension a large number.

For example, when I want to call my girlfriend at work, I have to dial something like

1-888-555-5555 x667833

That is 17 character if you want to store it stripped of all symbols (18885555555667833). If you want to allow free text (like 1-888-555-5555 ask for Bob), you should consider more characters.

Pierre-Alain Vigeant
Good point! Especially since these locations are coming from several vendors and I have no control over how the information is delivered.
Mikecancook
+2  A: 

I've found it useful to use names borrowed from the hCard microformat. Some of these are for people, and some for addresses, and it doesn't line up 1:1 with the hCard specification, however, this subset has been pretty useful so far:

honorificPrefix    nVarChar(20)
givenName          nVarChar(50)
additionalName     nVarChar(50)
familyName         nVarChar(50)
nickName           nVarChar(50)
honorificSuffix    nVarChar(20)
fn                 nVarChar(50)  -- # Formatted name
email              nVarChar(255)
title              nVarChar(50)
role               nVarChar(50)
organizationName   nVarChar(100)
organizationUnit   nVarChar(100)
streetAddress      nVarChar(100)
extendedAddress1   nVarChar(100) -- # hCard specs support any number of these
extendedAddress2   nVarChar(100) -- # I've not seen a need for more than two.
locality           nVarChar(55)
region             nVarChar(2)
postalCode         nVarChar(20)
postOfficeBox      nVarChar(20)
countryName        nVarChar(50)
phone              nVarChar(25)
fax                nVarChar(25)
latitude           float
longitude          float

Experience has shown that the data types and sizes listed have have worked well for many varied directories of people and organizations across the US. However, I like some of the comments here that suggest a larger data type for phone numbers. Also, I am not convinced that storing latitude and longitude as a float is the way to go, so your milage may vary.

Good luck!

Chris Nielsen
Email as 255? That make me sad. But they aren't the only one to do that. For example Paypal support only 127 character for an email. While this is usually enough, the standard is 320 characters.
Pierre-Alain Vigeant
Negative, see RFC 5321 section 4.5.3.1.3. Maximum length is 256 characters, but that includes angle brackets around the email address. So, the maximum length is really 254. Goes back to RFC821, and it was also in RFC2821.
Chris Nielsen
I'm open to alternatives for data types storing latitude and longitude. I've read the post here: http://stackoverflow.com/questions/385132/proper-best-type-for-storing-latitude-and-longitude, but much of it is over my head. I'll be storing the latitude and longitude results from Google Maps, or another service. If it makes any difference I was planning on using the Haversine formula for calculating distance and finding nearest locations to a given point.
Mikecancook
Sorry yeah I forgot about the forward path restriction.
Pierre-Alain Vigeant
A: 

What industry will this database be used in?

Many industries have standards that can be used as guidance for the design and implementation of objects, tables, and data types.

For example, having worked with health care and insurance information, one standard that affects my design is "ANSI ASC X12N 837 Health Care Claims (837)".

If you ever expect to import or export information from and to external systems, I'd use an industry standard as a guideline.

Darryl Peterson
A: 

I would not be so bold as to call these best practices, but here are my learned it the hard way pieces of advice:

  • General: use nvarchar over varchar everywhere in here. Skipped columns are OK by me.
  • id: I prefer using a uniqueidentifier here. There may be a need to keep it's uniqueness across different tables. Use newsquentialid() and create the clustered index on it.
  • address: Set up three of these.
  • state: Change to something more neutral like territory and change it to a nvarchar(100). Ideally, change this to territoryId (uniqueidentifier) and have another table that holds these.
  • zip: Change to postalCode and increase the size to 20.
  • phone: Increase the size to 50. I find it better not to get involved in trying to break this up of validate unless you really have a good reason and plan to use it.
  • fax: same as phone
  • notes: Change this to nvarchar(max) so you won't run in to as many limitations.
K Richard
+2  A: 

I would not use float for lat long. I've seen some very bad rounding issues when this is done and rounding in a lat long calculation can put you in the wrong location. Use a decimal type instead and define the number of places you need.

HLGEM
There seems to be a lot of disagreement and good examples for each opinion on this. For example: http://earthcode.com/blog/2006/12/latitude_and_longitude_columns.html vs http://code.google.com/apis/maps/articles/phpsqlajax.html. Basically I just want to return five or ten of the nearest locations out of a few thousand which will in turn show the real address. Within a mile or two and I think no one will mind. If we're talking 1,000 miles or more..that would be a problem.
Mikecancook
A: 

I've never figured out how to "comment" instead of "answer"

But I don't follow the warning about rounding issues with lat/long. Degrees rounded to four deciml places is max error under two meters. Are you lost if you are two meters away? If so, then no GPS device will ever help you.

Though if you use whatever Google maps uses to convert address to lat/long, you could be a block away.

Wes Groleau