I have just been assigned to renovate an old website, and I get to move it from some old archaic system to drupal. The only problem is that it's a real-estate system and a lot of data is stored. Currently all the information is stored in a single table, an id represents the house and then everything else is key/value pairs. There are a possible 243 keys per estate, there are 23840 estates in the system. As you can imagine the system is slow and difficult to query.
I don't think a table with 243 rows would be a very good idea, and probably worse than the current situation. I've done some investigating and here's what I've found out:
- Missing data does not indicate a 0 value, data is merged from two, unique sources/formats. Some guessing is involved. I have no control over the source of the data.
- There are 4 keys that are common to all estates, all values look like something that is commonly searched for and could be indexed
- There are 10 keys that are in the [90-100)% range
- 8 of these are information like who's selling it, and it's address.
- The other two seem to belong with the below range
- There are 80 keys that are in the [80-90)% range
- This range seems to mostly just list room types and how many the house has (e.g. bedrooms_possible, bathrooms, family_room_3rd, etc)
- This range also includes some minor information like school districts, one or two more pieces of data on the address.
- The 179 keys that are in the [0-80)% range include all sorts of miscellaneous information about the estate
- The keys are not static and some may be added or removed at will by the inputs (although it is rare).
My best idea was a hybrid approach, create a table that stores important, common information and keep a smaller key/value table.
How would you store this information?