ansaurus

Question

how to store data with many categories and many properties efficiently?

Answer 1

+1 A:

You can separate the database into two parts: Definition Tables and Data Tables. Basically the Definition Tables is used to interpret the Data Tables where the actual data is stored (some would say that the definition tables is more elegant if represented in XML).

The following is the basic idea.

Definition Tables:

TABLE class  
class_id (int)  
class_name (varchar)

TABLE class_property  
property_id (int)  
class_id (int)  
property_name (varchar)  
property_type (varchar)

Data Tables:

TABLE object  
object_id (int)  
class_id (varchar)  

TABLE object_property  
property_id (int)  
property_value (varchar)

It would be best if you could also create additional Layer to interpret the structure so as to make it easier for the Data Layer to operate on the data. And you must of course take into consideration performance, ease of query, etc.

Just my two cents, I hope it could be of any help.

Regards.

Jaya Wijaya 2010-03-08 04:30:32

That's a good idea. I will have a try. Thank you very much.

Mickey Shine 2010-03-08 05:09:14

No worries, let me know how it goes if it's not too much trouble. I'm all for stuff like this :)

Jaya Wijaya 2010-03-08 05:37:47

Answer 2

+1 A:

If your data collection isn't too big, the Entity-Attribute-Value (EAV) model may fit nicely the bill.

In a nutshell, this structure allows the definition of Categories, the list of [required or optional] Attributes (aka properties) the entities in such category include etc, in a set of tables known as the meta-data, the logical schema of the data, if you will. The entity instances are stored in two tables a header and a values tables, whereby each attribute is stored in a single [SQL] record of the later table (aka "vertical" storage: what used to be a record in traditional DBMS model is made of several records of the value table).

This format is very practical in particular for its flexibility: it allows both late and on-going changes in the logical schema (addition of new categories, additions/changes in the attributes of a given category etc.), as well the implicit data-driven handling of the underlying catalog's logical schema, at the level of the application. The main drawbacks of this format are the [somewhat] more sophisticated, abstract, implementation and, mainly, some limitations with regards to scaling etc. when the catalog size grows, say in the million+ entities range.

See the EAV model described in more details in this SO answer of mine.

mjv 2010-03-08 04:31:29

EAV is the same thing with what Jaya Wijaya describes, isn't it?

Mickey Shine 2010-03-08 10:18:26

@Mickey Shine: Yes, Jaya Wijaya's answers is an _example_ of EAV implementation. There can be many "twists" on this general basis, typically for performance purposes (for example having several possible columns in "object_property" table for the various value type (or possibly having several objet_property tables, one per tye etc, or also having some of the most common attributes stored in the object table rather than (or in addition to) the object_property table.. etc.), but on the whole these implementations share the same basic principle that the data is "stored vertically".

mjv 2010-03-08 13:04:30

Your answer makes me more clear about EAV. Thank you very much.

Mickey Shine 2010-03-09 00:58:33

Answer 3

A:

Triggered by this question and other similar ones, I wrote a blog post on how to handle such cases using a graph database. In short, graph databases don't have the problem "how to force a tree/hierarchy into tables" as there's simply no need for it: you store your tree structure as it is. They're not good at everything (like for example creating reports) but this is a case where graph databases shine.

nawroth 2010-03-23 19:48:47

ansaurus

tags:

views:

answers:

how to store data with many categories and many properties efficiently?

related questions