views:

27

answers:

1

I have a set of statistical data (about 100M size), which is organized in key-value pairs, some of the values are just numbers (e.g. like person's age or weight) and some are hierarchical (e.g. like person's employments - it can have a set of employment records, each again containing key/value pairs, etc.). The real data is not exactly these but the structure is similar.

I need to query these data with arbitrary set of criteria - i.e. I may want to ask something like "where 20 oldest persons worked 3 years ago" or "what is the sum of all salaries for all people that ever worked at company X for more than a year", or "give me all you know on people that found a new job recently", etc.

I can program each individual query pretty easily but since there can be many of them and they vary all the time it becomes tedious to program each one anew, so the question is if there's an existing tool that would make it easier for me to do such queries (if it has a nice GUI that's a bonus :). Something like SQL wouldn't work well because data fields aren't really fixed and making hierarchy work in SQL would be too much trouble IMHO. So is there a tool that I could use with relative ease for this task (i.e. not learning a whole new language for that - I'd better stay with hand-coding the queries then)?

A: 

You may want to look at MongoDB. It is a JSON data store, so it essentially works with key/value pairs, and you can nest JSON within JSON. It uses JavaScript as the query language. Of course, you'd need to convert your data to JSON, but this is not difficult.

Another option may be a graph database like Neo4j. Each record is a node and you can define relationships between nodes (visualized as edges).

I do not think either of these have any type of GUI, but they are pretty easy to query. MongoDB uses JS with bindings you can use to call the DB. Neo4j uses Java, but there are some bindings for other languages.

SQL queries would be challenging, but it would work. I will also throw PostgreSQL as an option since it is somewhat object oriented, but I am more familiar with the others.

Ryan Rosario