tags:

views:

43

answers:

3

Hi everybody,

I have a requirement to implement contact database. This contact database is special in a way that user should be able to dynamically (on runtime) add properties he/she wants to track about the contact. Some of these properties are of type string, other numbers and dates. Some of the properties have pre-defined values, others are free fields etc.. User wants to be also able to query such structure fast and easily. The database needs to handle easily 500 000 contacts each having around 10 properties.

It leads to dynamic property model having Contact class with dynamic properties.

class Contact{

private Map<DynamicProperty, Collection<DynamicValue> values> propertiesAndValues;

//other userfull methods

}

The question is how can I store such a structure in "some database" - it does not have to be RDBMS so that I can easily express queries such as

Get all contacts whose name starts with Martin, they are from Company of size 5000 or less, order by time when this contact was inserted in a database, only first 100 results (provide pagination), where each of these segments correspond to a dynamic property.

I need:

  • filtering - equal, partial equal, (bigger, smaller for integers, dates) and maybe aggregation - but it is not necessary at this point
  • sorting
  • pagination

I was considering RDBMS, but this leads more less to this structure which is quite hard to query and it tends to be slow for this amount of data

contact(id serial pk,....);

dynamic_property(dp_id serial pk, ...);

--only one of the values is not empty
dynamic_property_value(dpv_id serial pk, dynamic_property_fk int, value_integer int, date_value timestamp, text_value text);

contact_properties(pav_id serial pk, contact_id_fk int, dynamic_propert_fk int);

property_and_its_value(pav_id_fk int, dpv_id int);

I consider following options:

  • store contacts in RDBMS and use Lucene for querying - is there anything that would help with this?
  • Store dynamic properties as XML and store it to rdbms and use xpath support - unfortunatelly it seems to be pretty slow for 500000 contacts
  • use another database - Mango DB or Jackrabbit to store this information

Which way would you go and why?

A: 

Have you considered using Lucene for your querying needs? You could probably get away with just using Lucene and store all your data in the index. Although I wouldn't recommend using Lucene as your only persistence store.

Alternatively, you could use Lucene along with a RDBMS and take advantage of something like Compass.

Randy Simon
+1  A: 

Wikipedia has a great entry on Entity-Attribute-Value modeling which is a data modeling technique for representing entities with arbitrary properties. It's typically used for clinical data, but might apply to your situation as well.

Ken Liu
A: 
  1. You could try other kind of databases like CouchDB which is a document oriented db and is distributed
  2. If you want a dumb solution, for your contacts table you could add some 50 columns like STRING_COLUMN1, STRING_COLUMN2... upto 10, DATE_COLUMN1..DATE_COLUMN10. You have another DESCRIPTION column. So if a row has a name which is a string then STRING_COLUMN1 stores the value of your name and the DESCRIPTION column value would be "STRING_COLUMN1-NAME". In this case querying can be a bit tricky. I know many purists laugh at this, but I have seen a similar requirement solved this way in one of the apps :)
Calm Storm