views:

99

answers:

3

We have a large number of data in many categories with many properties, e.g.

category 1: Book

properties: BookID, BookName, BookType, BookAuthor, BookPrice

category 2: Fruit

properties: FruitID, FruitName, FruitShape, FruitColor, FruitPrice

We have many categories like book and fruit. Obviously we can create many tables for them (MySQL e.g.), and each category a table. But this will have to create too many tables and we have to write many "adapters" to unify manipulating data.

The difficulties are:

1) Every category has different properties and this results in a different data structure.

2) The properties of every categoriy may have to be changed at anytime.

3) Hard to manipulate data if each category a table (too many tables)

How do you store such kind of data?

+1  A: 

You can separate the database into two parts: Definition Tables and Data Tables. Basically the Definition Tables is used to interpret the Data Tables where the actual data is stored (some would say that the definition tables is more elegant if represented in XML).

The following is the basic idea.

Definition Tables:

TABLE class  
class_id (int)  
class_name (varchar)

TABLE class_property  
property_id (int)  
class_id (int)  
property_name (varchar)  
property_type (varchar)  

Data Tables:

TABLE object  
object_id (int)  
class_id (varchar)  

TABLE object_property  
property_id (int)  
property_value (varchar) 

It would be best if you could also create additional Layer to interpret the structure so as to make it easier for the Data Layer to operate on the data. And you must of course take into consideration performance, ease of query, etc.

Just my two cents, I hope it could be of any help.

Regards.

Jaya Wijaya
That's a good idea. I will have a try. Thank you very much.
Mickey Shine
No worries, let me know how it goes if it's not too much trouble. I'm all for stuff like this :)
Jaya Wijaya
+1  A: 

If your data collection isn't too big, the Entity-Attribute-Value (EAV) model may fit nicely the bill.

In a nutshell, this structure allows the definition of Categories, the list of [required or optional] Attributes (aka properties) the entities in such category include etc, in a set of tables known as the meta-data, the logical schema of the data, if you will. The entity instances are stored in two tables a header and a values tables, whereby each attribute is stored in a single [SQL] record of the later table (aka "vertical" storage: what used to be a record in traditional DBMS model is made of several records of the value table).

This format is very practical in particular for its flexibility: it allows both late and on-going changes in the logical schema (addition of new categories, additions/changes in the attributes of a given category etc.), as well the implicit data-driven handling of the underlying catalog's logical schema, at the level of the application. The main drawbacks of this format are the [somewhat] more sophisticated, abstract, implementation and, mainly, some limitations with regards to scaling etc. when the catalog size grows, say in the million+ entities range.

See the EAV model described in more details in this SO answer of mine.

mjv
EAV is the same thing with what Jaya Wijaya describes, isn't it?
Mickey Shine
@Mickey Shine: Yes, Jaya Wijaya's answers is an _example_ of EAV implementation. There can be many "twists" on this general basis, typically for performance purposes (for example having several possible columns in "object_property" table for the various value type (or possibly having several objet_property tables, one per tye etc, or also having some of the most common attributes stored in the object table rather than (or in addition to) the object_property table.. etc.), but on the whole these implementations share the same basic principle that the data is "stored vertically".
mjv
Your answer makes me more clear about EAV. Thank you very much.
Mickey Shine
A: 

Triggered by this question and other similar ones, I wrote a blog post on how to handle such cases using a graph database. In short, graph databases don't have the problem "how to force a tree/hierarchy into tables" as there's simply no need for it: you store your tree structure as it is. They're not good at everything (like for example creating reports) but this is a case where graph databases shine.

nawroth