views:

150

answers:

3

Let's say I have a set of data where each row is a pair of coordinates: (X, Y). Associated with each point I have arbitrary metadata, such as {color: yellow} or {age: 2 years}.

I'd like to be able to store the data and metadata in such a way that I can query the metadata (eg: [rows where {age: 2 years, color: yellow}]) and in return receive all of the matching coordinate rows.

There are no predefined metadata columns or values, nor will all coordinate rows necessarily have the same metadata columns. What would be the best way to store this data for the fastest access? Would it be possible using something such as Tokyo Cabinet (without Tokyo Tyrant) or SQLite, or is there a better option?

+2  A: 

Any relational database should be able to handle something like that (you'd basically just being doing a join between a couple of tables, one for the data and one for the metadata). SQLite should work fine.

Your first table would have the data itself with a unique IDs for each entry. Then your second table would have something like 3 working columns: metadata key, metadata value, and associated entry id.

Example data table:

ID  Data
--------
1   (1,1)
2   (7,4)
3   (2,3)

Example metadata table:

ID     Key         Value
--------------------------
1      "color"     yellow
1      "age"       3
2      "color"     "blue"
2      "age"       2
3      "color"     "blue"
3      "age"       4
3      "loc"       "usa"

Then if you wanted to search for all data points with an age of at least 3, you'd use a query like this:

SELECT * from datatable WHERE datatable.ID = metadatatable.ID AND metadatatable.Key="age" AND metadatatable.Value >= 3
Amber
How would you query for multiple conditions? For example, as mentioned in my original post: [all coordinate rows where age=2 and color=blue]
Ron Eggbertson
You'd use the INTERSECT operator.
Amber
http://www.techonthenet.com/sql/intersect.php for the syntax.
Amber
A: 

Since the columns are neither predefined nor consistent across all rows you have to either go with bigtable type implementations such as google appengine (exapndo models w/listproperty) or cassandra/hbase etc. (see http://en.wikipedia.org/wiki/BigTable)

For simple implementations using sqlite you could create a string field formatted as

f1  | f2  | metadata as string
x1  | y1  | cola:val-a1 colb:val-b1 colc:val-c1
x2  | y2  | cola:val-a2 colx:val-x2

and use SELECT * from table WHERE metadata like "%cola:val-a2%"
molicule
+1  A: 

Using @Dav's schema, a way to get " [all coordinate rows where age=2 and color=blue] " is (assuming (ID, Key, Value) is Unique in metadatatable, i.e., the latter has no entirely duplicate rows):

SELECT datatable.Data 
  FROM datatable
  JOIN metatadatable AS m USING(ID)
  WHERE (m.Key="age" AND m.Value=2)
     OR (m.Key="color" AND m.Value="blue")
  GROUP BY datatable.ID, datatable.Data
  HAVING COUNT()=2
Alex Martelli