views:

4118

answers:

6

Say you have a very simple data structure:

(personId, name)

...and you want to store a number of these in a javascript variable. As I see it you have three options:

// a single object
var people = {
    1 : 'Joe',
    3 : 'Sam',
    8 : 'Eve'
};

// or, an array of objects
var people = [
    { id: 1, name: 'Joe'},
    { id: 3, name: 'Sam'},
    { id: 8, name: 'Eve'}
];

// or, a combination of the two
var people = {
    1 : { id: 1, name: 'Joe'},
    3 : { id: 3, name: 'Sam'},
    8 : { id: 8, name: 'Eve'}
};

The second or third option is obviously the way to go if you have (or expect that you might have) more than one "value" part to store (eg, adding in their age or something), so, for the sake of argument, let's assume that there's never ever going to be any more data values needed in this structure. Which one do you choose and why?


Edit: The example now shows the most common situation: non-sequential ids.

A: 

If you're never going to add properties other than name, then the first structure is the way to go -- simple to read and write (by ID), and simple to loop through.

If you do plan to add properties later, the optimal structure would depend on what you plan to use it for. The second structure is simplest if you plan to loop through it frequently, and want to keep your objects in a certain order. The third structure is best if you want simple reads/writes by ID, and don't care about order.

I would personally use a structure like this:

var people = {
  1: { name: 'Joe' },
  2: { name: 'Sam' },
  3: { name: 'Eve' }
};

Not only is this extensible and simple to read/write by ID, but it doesn't store the ID redundantly as in your third structure.

Ron DeVera
Sometimes storing the redundant ID is useful — it lets you go the other was, quickly. Otherwise, if you've got the Eve object, you have no quick way to find her id.
derobert
exactly right, derobert!
nickf
The redundant ID may be useful, but your data ends up repeating itself. If your object's ID suddenly differs from the hash key, then all kinds of mayhem can ensue. To protect against that, you'd then have to introduce some constraint logic.
Ron DeVera
@Ron DeVera: To protect against the divergence, test the code. I suspect the amount of code that touches this data is not huge, so quite testable. Similar to a double-linked list.And most likely, IDs don't change ever.
derobert
+1  A: 

Actually, there is a fourth option:

var people = ['Joe', 'Sam', 'Eve'];

since your values happen to be consecutive. (Of course, you'll have to add/subtract one --- or just put undefined as the first element).

Personally, I'd go with your (1) or (3), because those will be the quickest to look up someone by ID (O logn at worst). If you have to find id 3 in (2), you either can look it up by index (in which case my (4) is ok) or you have to search — O(n).

Clarification: I say O(logn) is the worst it could be because, AFAIK, and implementation could decide to use a balanced tree instead of a hash table. A hash table would be O(1), assuming minimal collisions.

Edit from nickf: I've since changed the example in the OP, so this answer may not make as much sense any more. Apologies.

Post-edit

Ok, post-edit, I'd pick option (3). It is extensible (easy to add new attributes), features fast lookups, and can be iterated as well. It also allows you to go from entry back to ID, should you need to.

Option (1) would be useful if (a) you need to save memory; (b) you never need to go from object back to id; (c) you will never extend the data stored (e.g., you can't add the person's last name)

Option (2) is good if you (a) need to preserve ordering; (b) need to iterate all elements; (c) do not need to look up elements by id, unless it is sorted by id (you can do a binary search in O(logn). Note, of course, if you need to keep it sorted then you'll pay a cost on insert.

derobert
I understand why (1) and (3) are faster then (2), but I'm wondering how you got O(log(n)) for the time? Shouldn't it be O(1) as the ids should be stored in a hash table?
Nathaniel Reinhart
I gave log n as worst, because I could see an implementation deciding to use a balanced tree instead of hash table. A hash table would be O(1) [assuming collisions are minimal], of course.
derobert
the IDs aren't always as simple as that and they'd rarely be sequential IRL, which is why you need to specify them. I do admit it was a poor/ambiguous example though.
nickf
+3  A: 

Each solution has its use cases.

I think the first solution is good if you're trying to define a one-to-one relationship (such as a simple mapping), especially if you need to use the key as a lookup key.

The second solution feels the most robust to me in general, and I'd probably use it if I didn't need a fast lookup key:

  • It's self-describing, so you don't have to depend on anyone using people to know that the key is the id of the user.
  • Each object comes self-contained, which is better for passing the data elsewhere - instead of two parameters (id and name) you just pass around people.
  • This is a rare problem, but sometimes the key values may not be valid to use as keys. For example, I once wanted to map string conversions (e.g., ":" to ">"), but since ":" isn't a valid variable name I had to use the second method.
  • It's easily extensible, in case somewhere along the line you need to add more data to some (or all) users. (Sorry, I know about your "for argument's sake" but this is an important aspect.)

The third would be good if you need fast lookup time + some of the advantages listed above (passing the data around, self-describing). However, if you don't need the fast lookup time, it's a lot more cumbersome. Also, either way, you run the risk of error if the id in the object somehow varies from the id in people.

Daniel Lew
on your third point there. Objects properties can be accessed in bracket notation to avoid that problem: var x = { "a.b>c++" : "foo"}; alert(x['a.b>c++']);
nickf
You do have the problem of a few reserved words with properties, I believe. But no problems when you're using numbers.
derobert
@derobert: I don't think so. x = { "delete" : 1, "for" : 2, "function" : 3}; -- this is valid and can be accessed in the same way.
nickf
+1  A: 

Assuming the data will never change, the first (single object) option is the best.

The simplicity of the structure means it's the quickest to parse, and in the case of small, seldom (or never) changing data sets such as this one, I can only imagine that it will be frequently executed - in which case minimal overhead is the way to go.

karim79
A: 

Given your constraint that you will only ever have name as the value, I would pick the first option. It's the cleanest, has the least overhead and the fastest look up.

Nathaniel Reinhart
A: 

The third option is the best for any forward-looking application. You will probably wish to add more fields to your person record, so the first option is unsuitable. Also, it is very likely that you will have a large number of persons to store, and will want to look up records quickly - thus dumping them into a simple array (as is done in option #2) is not a good idea either.

The third pattern gives you the option to use any string as an ID, have complex Person structures and get and set person records in a constant time. It's definitely the way to go.

One thing that option #3 lacks is a stable deterministic ordering (which is the upside of option #2). If you need this, I would recommend keeping an ordered array of person IDs as a separate structure for when you need to list persons in order. The advantage would be that you can keep multiple such arrays, for different orderings of the same data set.

levik