tags:

views:

31

answers:

2

What I'm planning to do is a) parse a file for some lines matching a regular expression b) store the match in some sort of database / file so I don't have to do the parsing again and again c) call another program passing the matches as arguments

While I can imagine how to do a) and c), I'm a little bit unsure about b). The matches are of the form

key:attribute1:attribute2:attribute3

where attribute 2 may be optional. I'm thinking of storing the results in a simple database but the problem is the database needs to available on a number of Unix platform for the program to work. Are there any (simple) databases which can be found on any Unix platforms? Or should I use some sort of index-sequential file?

+1  A: 

I recommend sqlite. It is very portable and thus available for a vast variety of operating systems. It's also lightweight and has very nice programming language bindings (C API).

A different approach can be a key-value store ("NoSQL DB") like Redis, which is also portable. You can use it on a local machine as well and it will be quite fast.

Eli Bendersky
+3  A: 

If you don't need SQL, look at the *DBM family of tools. Sleepycat (now part of Oracle) has BerkeleyDB; GNU has GDBM. You may also find NDBM. Be cautious about using plain DBM; rumour (at least) has that it is rather buggy.

These are all systems that provide a hashed lookup based on a key and arbitrary associated data, exactly what you seem to be using.

Note that hashed lookups are great for seeking 'exactly this key', but atrocious for 'all keys between this value and that value'. If you need to do the latter queries, look carefully at the indexing schemes offered: there may be what you need in these packages, or you may be better off looking at something else - an indexed sequential access mechanism (ISAM) with B-Tree or similar support. The *DBM packages are fairly ubiquitous, though (because a fair number of systems do only need exact value lookups). For example, they are used for things like mail alias files (where you do indeed do exact lookups for the expansion of a particular alias).

Jonathan Leffler
Second this - libdb (BerkeleyDB) is now owned by Oracle but it's still BSD-licensed open source and is found on most UNIX systems out there.
caf