views:

50

answers:

4

Hi, I have few Gigabytes text file in format: {"user_ip":"x.x.x.x", "action_type":"xxx", "action_data":{"some_key":"some_value"...},...}

each entry is one line.

First I would like to easily find entries for given ip. This part is easy because I can use grep for example. However even for this I would like to find better solution because I would like to get response as fast as possible.

Next part is more complicated because I would like to find entries from selected ip and of selected type and with particular value of some_key in action_data.

Probably I would have to convert this file to SQL db (probably SQLite, because it will be desktop APP), but I would ask if there are exists better solutions?

+1  A: 

Yes, put it into a database, any database. Then querying it will be straightforward.

Skilldrick
A: 

If you need the relational guarantees of an SQL-based DB, definitely go ahead with SQLite. It will allow for fast queries, joins, aggregations, sorts, and overall any sort of search you could possibly dream up. It sounds like this is just a big list of Actions performed by users at some IP, so you'll probably want to use some sort of sequence as your primary key since none of the other attributes look like good candidates.

On the other hand, if you just need to do very simple queries, e.g. look up entries by IP, look up entries by action type, etc., you might want to look into Oracle Berkeley DB. As long as you don't need any searches that are too fancy, Berkeley DB will let you store Terabytes of data and access them at record speed.

So look over both and see what's best for your use case. They're good for different things, which might be why both are available as storage systems on Android, for instance. I think SQLite will probably win out, but when thinking about embedded local DB systems you should always at least consider both of these technologies.

jasonmp85
+1  A: 

You could take a look at MongoDB, a document based database. With it you essentially store JSON objects that you can then index and easily query in an efficient way. You can find about how to query in the docs: Querying.

reko_t
Thank. I tried it (iin demo shell on website), looks cool. However probably I won't be able to use it in my project because of AGPL license :(.//edit:I see drivers are on Apache license, so I think from legal point of view I can probably use it, unfortunately I'm not sure if I would be able to convince my client.
Maciek Sawicki
+1  A: 

Hello.

Just wanted to mention that Oracle Berkeley DB 11gR2 (released on April 1st, 2010) introduces support for a SQL API. In fact, the SQL API is the sqlite3() API. So, as Jason mentioned, if you'd like the ease-of-use of SQLite, combined with the scalability and concurrency of Berkeley DB, you can now get both things in a single library.

Regards,

Dave

dsegleau