tags:

views:

94

answers:

5

Hi there,

I Have some data to store,

I must store this data for every user,

The data is more or less

key: 0  
value: 107,

key: 1  
value 213

There are about 80 key/value sets per user

So my question is,

Do I have one row that is basically

user_id key1, key2... key80

or do I have 80 rows of

user_id, key, value


Data I must store:
You can see the actual data i must store here: http://219.89.41.169/ReachApi/?gt=The%20hailwood

I Cant just copy/paste as it is approx 8mb of data, so the page may take a while to load.

But as you can see I am given the data in that format, Hence why I must store it this way.

+1  A: 

I would stay away from 81 columns in a database table, especially for things like this, where you are likely to add more "keys" in the future (in which case you would have to add more columns to the table).

"key-value" tables can lead to performance problems, and make it clunky to write queries, so it might be advantageous to see if you can somehow group the keys into related parts, which you could stuff into smaller, more concrete tables.

Andy White
Depends on whether the 'keys' will apply to every user. If they will, IMO it'd be better to have 81 columns and avoid the ugliness of potentially having to go through 80 rows of data for each user, if you ever need all the data at once. Adding columns isn't that big a deal, and beats losing the ability to specify what type a field should have and whether it's required.
cHao
Definitely true. I think it's a trade-off. You can use key-values and have to mess with type conversions and clunky queries, or use a huge table, and have to maintain the database and update your code anytime you add a column.
Andy White
+2  A: 

You should be using your first approach - one row of

user_id | key | value

This way you're more flexible when you need more / less keys per user.

dhh
*IF* you need more/fewer keys per user. If every user will have the same "keys" (read: attributes, or fields), you've just massively complicated your queries for no gain.
cHao
+1  A: 

Pro 80 colums

  • one insert on user creation, and only update statements for modifications
  • comparing the values of two or more keys can be done with simple SELECT statements (without using self joins)
  • O/R mapping just works out of the box
  • the database can handle type checking

Pro 80 rows

  • adding keys is just an insert, no ALTER TABLE required (which implies a long downtime on updates if you have a huge number of users
  • plugins can easily add they own configuration settings (assuming some kind of prefix in the key names to avoid conflicts)

Comments about 80 rows

  • If you use this approach, i suggest to use "key_id" instead of "key" and have a second table for keys so that key names can be validated. That table can have a third column for documentation.
nhnb
Additional pros for 80 columns: (1) it's easier to specify constraints on the data, like that `key3` must be unique or `key4` is required. (2) Indexes are more useful -- with 80 rows, either all the 'key3's, or all the '0.5's, or all the keys for a given user will be lumped together, adding to the amount of data that has to be sifted through to find what you want. (3) If the same keys will apply to every user, having the data attached to the user makes more sense relationally, even if it does make people wring their hands over there being so many columns.
cHao
+2  A: 

I'd normally cringe at the thought of 81 columns in a table. However, IF:

  • The number of keys/fields is unlikely to change, AND
  • All of the fields relate directly to the user, and apply to every user, and/or
  • You're likely to need to query for more than one key at a time for a given user,

then it makes more sense to have that many columns than to store each user/key combo as its own row. You get type safety, the ability to constrain and index your data better, and you get to query a user's stats without needing a dozen joins (or getting back a bunch of rows that you have to then assemble into a data structure).

If the number of fields is constantly in flux, or the fields don't apply to everyone (ie: some users would have different numbers of fields), or you never want to know more than user 3225's value for field 53, then a user/key/value table makes more sense. But it'll be a pain in the ass to keep everything correct, updates would take forever (because indexes would need to be redone, and you will need indexes), and queries will get hideous if you ever need more than one or maybe two fields at a time.

cHao
+2  A: 

I took a look at the data. Why do you think about storing the data in a database at all? I suppose you want to read the data, run some algorithm on it, and compute a result, correct? Or is it actually a requirement to store the data permanently or this is even all you are trying to do?

You're lists all seem to have the following format:

[KillsByEnemyTypeClass] => Array
                    (
                        [0] => stdClass Object
                            (
                                [Key] => 0
                                [Value] => 0
                            )
                        ...


[MedalCountsByType] => Array
                    (
                        [0] => stdClass Object
                            (
                                [Key] => 0
                                [Value] => 0
                            )

As the data format implies, the lists are consecutive arrays, not actually key-value pairs. Because the keys are all consecutive, you can store the values in one large array in your programming language of choice.

This is your data format:

struct Data {
    std::string reason;
    int status;

    struct AiStatistics {
         int aDeathsByDamageType[82];
         int nHopperId;
         int aKillsByDamageType[82];
         int nMapId;
         double fMedalChestCompletionPercentage;
         int aMedalCountsByType[128];
         int nTotalMedals;
         int nVariantClass;
         ...
    } aaistatistics[9]; 
}

It looks as if the arrays must have a static size, because the 82 different damage types will probably mean something to you. If there are suddenly 83 damage types, the program that generates this data will have changed and you will have to adapt your algorithms, too. That means the data and your program are not independent and the advantages of using a database are questionable.

Update

Thanks for clarifying your requirements. I understand why you have to cache the data for other clients now.

But: Is the data you linked to all the data you have to store? That means, do you only cache the output from the web API and if the output changes you overwrite the previously stored data? Or is there a temporal dimension and you want to store the sequence of outputs of the API? In both cases, a thin C++ API around the binary data could be much more efficient than a database.

If its just for caching the data, I would still propose to model the database after the object model above. The AI Statistics table has one column per member variable, i.e., one column for the complete DeathsByDamageType array. Store the whole array as one value. Clients cannot lookup a single value using this model, but have to receive the complete array. Unless you have a concrete use case for anything else, I'd stick with this. Your database will be much simpler.

If this is really, really, really not enough for your purposes, your tables would probably be:

table Data { id, reason, status }
table AiStatistics { id, data_id, ..., all integer members like nTotalMedals etc }
table DeathByDamageType { aistat_type, index, value }
table ... for every other array member of AiStatistics

By default, storing DeathByDamageType in this way is really space-inefficient, the table is at least three times larger than the array values because for every value you have to store the AiStatistics reference id and the array index separately.

If you do it this way, at least exploit the sparsity in the arrays and don't store values in DeathByDamageType that are 0. You could not do this with the array.

Sebastian
Basically The bungie.net API returns way to much data at once, The point of storing it is this is as an intermediary to all other applications I create.
Hailwood
We have a 5 requests/s limit on the api so this caches the user data and it is updated when requested if it is older than 15 minutes.
Hailwood
The point of storing it in the database means that I/any one who uses it can request exactly the data they need. This is useful as Some of the apps using this are entirely client side and request the data through ajax.
Hailwood
it would simply not be viable for a client side app (think javascript) to retrieve and process 20mb of data itself(there are about 5 different calls to get all user data)
Hailwood
Also, Although these are consecutive, There are many others from other calls that are not consecutive, The key represents a key value that is grabbed from a meta data call, the meta data is already stored in the database, when the data is called it is merged before it is output.
Hailwood
Unfortunately when the data is retrieved I need to do a join on a meta data table, So Storing the whole array will not work :(
Hailwood