views:

55

answers:

4

I'm designing a schema for an event registration system which will involve students from schools across many different regions. My main problem is the method by which I store school names in the database.

Given that students will be registering separately, it's highly likely that spelling variations of the same school name will accumulate over time. I'd like an easy way to purge these, especially as one of the statistics we'd like to gather would be the number of schools and institutions that register for the event.

I'm debating between storing school_name as an extra column in a Participant table, or storing a school_id as a foreign key referencing a School table (can't think of any other way). Which one would prove more efficient when it comes to utilization of storage, ease of purging duplicate data, and other factors?

+2  A: 

Storing an ID as a foreign key to a School table is preferable since it leads to less duplication.

Ben S
+2  A: 

If you want to avoid the possibility of mistakes, you could provide a list (yes, it might be large) of existing school names. If they choose one, you store the ID of that school. Because you might not be able to anticipate ALL schools, you could have a free-form text field for "other" schools and store that as text on the participant record. You may have to do periodic reconciliation to add new schools to the school list, or link a participant's "other" school to an existing school (maybe it was in the list but they just didn't see it or maybe they misspelt it).

FrustratedWithFormsDesigner
but see, the problem is we don't really have a huge list of schools. And this is something I experienced when applying for colleges - when a list of schools doesn't have your school name on it, it's not really encouraging :)
mganjoo
@mganjoo: well, if you want a comprehensive list, is there some government department or public record that would have that list? If you want to avoid discouraging students you might be able to do that through the UI, maybe some AJAX that autocompletes the school name as a text field so they never see the *full* list that their school is not on, but if their school *is* on the list, they get a nice auto-complete.
FrustratedWithFormsDesigner
government list - extremely unlikely. this is an extra-curricular event (model UN) held across different countries. AFAIK, public school data is quite heavy on the pocket.though, now that you mention it, i might be able to start off with the list of schools that registered last year, at the very least.
mganjoo
@mganjoo: Ah, so you have precedent from previous years. That would be a good start. Maybe there's other extra-curricular clubs who would be willing to share lists of school names?
FrustratedWithFormsDesigner
@FrustratedWithFormDesigner: I checked with my club, and it does look like we'll have enough names to be getting along with. Thanks! :)
mganjoo
+1  A: 

"Which one would prove more efficient when it comes to utilization of storage, ease of purging duplicate data, and other factors?"

Storage is cheap, time is not. So choose the approach which will minimize the amount of time you spend on de-duping the data and save your users some typing by providing a pre-prepared list of schools, and enforce the foreign key.

Of course, this means you have to obtain such a list, and provide a mechanism for searching it, which is inevitably more work than than giving your users a free text box. Nothing is every easy :) But free text is the enemy of standardization so avoid it as whenever possible.


According to your profile you are located in Singapore. If that is where your application is based you may find this Wikipedia page saves you a lot of typing (usual caveats about Wikipedia apply). Lists of schoold from many other countries are also available. Find out more.


'do "autocomplete" fields serve as a flexible replacement for drop-downs? '

They can do. But they can be completely irritating, especially if they make it hard to guarantee we have chosen the value we meant to choose. I find the SO tag feature is confusing in this regard, and I'm a technical type. Dropdowns are easier to understand. On the other hand, most young people are completely comfortable with predictive texting, so perhaps auto-completion would be more suited to them.

APC
that really does make sense. as an aside, do "autocomplete" fields serve as a flexible replacement for drop-downs? Maybe more prone to errors than a drop down list set in stone, but it does serve the purpose, right?
mganjoo
A: 

School names do change over time. Having lived through such things happening, it makes a lot of sense to make the names not be primary keys, i.e., you've got a table mapping them to an ID for each school which doesn't ever change for a school. The rest of the database can then use the ID and you can have proper foreign key relationships set up; the full kaboodle. You will want to constrain the name column to be unique; even if schools change names, no two will ever have the same name because they function in a very PK-like way in real life over a multi-month timescale (it's only on a multi-year timescale that you have to deal with changes).

Donal Fellows
School names are not unique. Just do a quick look at how many John F. Kennedy High schools there are (I know this because many new schools in the US built in 1964 were named this). School names are unique in a school district usually not in any other way shape or form.
HLGEM
@HLGEM: I was thinking in terms of university schools; how many schools of computer science does the average university have? (Actually, I worked somewhere which had two for a short while after a merger, but even then they had different names.) Moreover, does the average school district have multiple JFK Highs? It's only at larger scales that you have problems.
Donal Fellows