tags:

views:

258

answers:

9

There are a multitude of key-value stores available. Currently you need to choose one and stick with it. I believe an independent open API, not made by a key-value store vendor would make switching between stores much easier.

Therefore I'm building a datastore abstraction layer (like ODBC but focused on simpler key value stores) so that someone build an app once, and change key-value stores if necessary. Is this API too simple?

get(Key)
set(Key, Value)
exists(Key)
delete(Key)

As all the APIs I have seen so far seem to add so much I was wondering how many additional methods were necessary?

I have received some replies saying that set(null) could be used to delete an item and if get returns null then this means that an item doesn't exist. This is bad for two reasons. Firstly, is it not good to mix return types and statuses, and secondly, not all languages have the concept of null. See:

http://stackoverflow.com/questions/2324337/do-all-programming-languages-have-a-clear-concept-of-nil-null-or-undefined

I do want to be able to perform many types of operation on the data, but as I understand it everything can be built up on top of a key value store. Is this correct? And should I provide these value added functions too? e.g: like mapreduce, or indexes

Internally we already have a basic version of this in Erlang and Ruby and it has saved us alot of time, and also enabled us to test performance for specific use cases of different key value stores

+4  A: 

If all you are doing is getting, setting, and deleting keys, this is fine.

David Pfeffer
Well I guess everything else could be build on a layer on top of this, right?
Zubair
hmmm... depends on the system requirements: if you are talking about a Big KV store, you might not want to have to iterate through a large set just to `reset` each key...
jldupont
@jldupont. When you say "reset", what do you mean?
Zubair
+6  A: 

Do only what is absolute necessary, instead of asking if it is too simple, ask if it is too much, even if it only has one method.

F.Aquino
+3  A: 

There is no such thing as "too simple" for an API. The simpler the better! If it solves the need the way it is, then leave it.

Chris Conway
+3  A: 

The delete method is unnecessary. You can just pass null to set.

Edited to add:

I'm only kidding! I would keep delete, and probably add Count, Contains, and maybe an enumerator (or two).

Jeffrey L Whitledge
+1, without has_key() delete makes no sense
wuub
True, but I don't think it is a good idea. It makes it less obvious what the code does for the reader. A person reading delete(a) will probably understand what's going on. A person reading set(a, null) however will probably have to look in the documentation to figure out what's going on.There may also be situations where storing null is something exactly what the user wants to do.
Laserallan
I added "exists" as I don't want to start mixing return values with statuses. Thanks
Zubair
But depending on the system, this might mean an additional `round trip` to get the data...
jldupont
@Laserallan - Agreed. I have to confess that my answer was really a joke. I hate magic functions that do lots of different things depending on the parameters!
Jeffrey L Whitledge
I don't think that this answer qualifies as a joke :) Without exists()/has_key() semantics of delete are really hard to define. Consider following sequences 1) set(a, 42); set(a, null); b = get(a); 2) set(a, 42); delete(a); b = get(a). After first sequence b equals null (obviously). After second sequence value of b is not well defined, either it's null and then you are unable to differentiate between null values and no key or some kind of condition was raised and it needs to be included in the API as well.
wuub
@wuub - Yes, that's true. If given the option, though, I would prefer to be able to store a null explicitly, and throw an exception on a get of an unset key. And that seems to be the direction that the OP is taking it now.
Jeffrey L Whitledge
@wuub. Exactly the reason I used internally of not mixing statuses and values, thanks for explaining it better than I did! :)
Zubair
+2  A: 

I am all for simplifying an interface to its bare minimum but without having more details about the requirements of the system, it is tough to tell if this interface is sufficient. Sure looks concise enough though.

Don't forget to document the semantics for "key non-existent" as it isn't clear from reading your API definition above. updated: I see you have added the exists method: is this necessary? you could use the get method and define a NIL of some sort, no?

Maybe worth thinking about: how about considering "freshness" of a value? i.e. an associated "last-modified" timestamp? Of course, it depends on your system requirements.

What about access control? Is it within scope of the API definition?

What about iterating through the keys? If there is a possibility of a large set, you might want to include some pagination semantics.

jldupont
I have added "exists". Also interesting point about the Freshness. This will be used on top of keyvalue stores which are eventually consistent, but they offer different types of versioning. Some use Version numbers (Riak) and and some timestamps (Cassandra)
Zubair
In regard to your question about NIL, this is a strange one as I am not sure if all languages have a concept of a NIL value
Zubair
Access control is something that will go in a "connection" string. In Erlang for example it will be the first parameter of every function, and will include any special priviledges needed.
Zubair
Iteration is still open to how I will do this. I have many ideas but I need to make sure they are usable for end users
Zubair
+4  A: 

Your API lacks some useful functions like "hasKey" and "clear". You might want to look at, say, Python's hack at it, http://docs.python.org/tutorial/datastructures.html#dictionaries, and pick and choose additional functions.

Everyone is saying, "simple is good" and that's true until "simple is too simple."

Scott Stafford
I added "hasKey" as "exists". I'm not sure about "clear" though
Zubair
Clear is less mandatory if there is good control over construction and destruction.
Scott Stafford
building an API for internal use is a good example of yagni. don't build it until you need it. doing so will keep the design as simple as it has to be and not "too simple".
Chris Conway
Yes, we need the API internally as we have several projects that need it. But it is still too few projects and langauges to make sure we get it right, which is why we are asking here
Zubair
HasKey is of limited value, as, unless you have some kind of locking mechanism (which I'd argue against), the result of the call would be obsolete as soon as you got it. SetIfNonExisting or something similar may make more sense, similar to exclusive create on a file.
kyoryu
SetIfNonExisting is an interesting command I do need to think about. Can atomic operations like incr and decr be implemented on top of SetIfNonExisting?
Zubair
@Zubair: No, incr and decr can't, you'd need a function that could take a function, like "DoIfKeyExistsElseSetTo(r => r + 1, 1)" -- that would be atomic, but impossible across languages and not very simple. ;)
Scott Stafford
hmm, interesting @scott, I'm going to think about that
Zubair
+1  A: 

As mentioned, the simpler the better, but a simple iterator or key-listing method could be of use. I always end up needing to iterate through the set. A "size()" method too, if not taken care of by the iterator. It obviously depends on your usage, though.

T.Kliether
Size is a difficult one as often it is a hugely expensive operation. I am consudering an iterator, but I only want to add it if I can make it so simple in use as to be self describing
Zubair
@Zubair - If a size method is deemed necessary, then, in an API like this, it can always be made a O(1) operation. Just keep a count variable, and increment or decrement as necessary.
Jeffrey L Whitledge
Yes, keeping a running count should work fine. For the iterator (if you need it), just a simple "getKeys()" would handle it, if a bit more memory intensive.
T.Kliether
Size can get very tricky though. Example: for one customer they use a database as their backend key value store. When the data is updated by clients external to our API then we lose count of those inserts/deletes.
Zubair
+2  A: 

When creating an API, you need to ask yourself, what does my API provide the user. If your API is so simplistic that it is faster and easier for your client to write their own app, then your API has failed. Ask yourself, does my functionality give them specific benefits. If the answer is no, it is too simplistic and generic.

Jeremy B.
Sorry, I didn't understand what you mean. Could you rephrase your answer please. Thanks
Zubair
comes back to listing the "system requirements" based on the "use cases" IMO.
jldupont
We already have a basic version of this in Erlang and Ruby and it has saved us alot of time, and also enabled us to test performance for specific use cases of different key value stores
Zubair
A: 

It's not too simple, it's beautiful. If "exists(key)" is just a convenient shorthand for "get(Key) != null", you should consider removing it. I guess that depends on how large or complex the value you get() is.

eirikma
Does null exist in all languages though?
Zubair