views:

277

answers:

6

I would like to create a data structure or collection which will have O(1) complexity in adding, removing and calculating no. of elements. How am I supposed to start?

I have thought of a solution: I will use a Hashtable and for each key / value pair inserted, I will have only one hash code, that is: my hash code algorithm will generate a unique hash value every time, so the index at which the value is stored will be unique (i.e. no collisions).

Will that give me O(1) complexity?

+3  A: 

Yes that will work, but as you mentioned your hashing function needs to be 100% unique. Any duplicates will result in you having to use some sort of conflict resolution. I would recommend linear chaining.

edit: Hashmap.size() allows for O(1) access

edit 2: Respopnse to the confusion Larry has caused =P

Yes, Hashing is O(k) where k is the keylength. Everyone can agree on that. However, if you do not have a perfect hash, you simply cannot get O(1) time. Your claim was that you do not need uniqueness to acheive O(1) deletion of a specific element. I guarantee you that is wrong.

Consider a worst case scenario: every element hashes to the same thing. You end up with a single linked list which as everyone knows does not have O(1) deletion. I would hope, as you mentioned, nobody is dumb enough to make a hash like this.

Point is, uniqueness of the hash is a prerequisite for O(1) runtime.

Even then, though, it is technically not O(1) Big O efficiency. Only using amortized analysis you will acheive constant time efficiency in the worst case. As noted on wikipedia's article on amortized analysis

The basic idea is that a worst case operation can alter the state in such a way that the worst case cannot occur again for a long time, thus "amortizing" its cost.

That is referring to the idea that resizing your hashtable (altering the state of your data structure) at certain load factors can ensure a smaller chance of collisions etc.

I hope this clears everything up.

JGord
Good thing there isn't such a thing as a `size()` method, no sirree! And it would definitely not have been in there since the creation of the `HashMap` type back in Java 1.2. After all, that would prove you wrong…
Donal Fellows
http://java.sun.com/javase/6/docs/api/java/util/HashMap.html#size()I knew that =D
JGord
Why not edit your answer than, so everyone can see that directly... ;-)
dertoni
hashing doesn't need to be 100% unique -- I think the questioner doesn't understand O(1) or hashing because it is O(1) even without uniqueness
Larry Watanabe
@Larry: you've posted this nonsense in a few different places, but I don't think *YOU* understand how a hash table will function as you approach 100% probability of collisions.
Mark Peters
@Mark: The statement "[operations on a hash table take constant time on average](http://en.wikipedia.org/wiki/Hash_table#Drawbacks)" sounds like O(1) to me. Using `return 42` of course degenerates into O(n) but that's not a typical case. So in what way is Larry's statement "nonesense"? Clearly I must be missing something.
Stephen P
@Stephen P: a well implemented hash table will be O(1) for most inputs. But the worst case scenario is that the keys to be hashed all just happen to generate exactly the same hashcode, which causes it to degenerate to O(N). Hence the worst case for a general purpose hashtabel is O(N), even though you are extremely unlikely to see anything other than O(1).
GaryF
@Stephen P: Larry was trying to make the argument that there isn't a correlation between the runtime complexity of a hashtable and the quality of a hash. That is nonsense. The complexity gradually reaches O(N) as your hashing gets weaker. It's not like it's perfectly O(1) until you have an absolutely non-unique hash. It gets there gradually, meaning it's never O(1) unless you have perfect hashing. But you're usually willing to call it O(1) provided a reasonably good hash.
Mark Peters
@Stephen: by the way, in the same article: *Perfect hashing allows for constant time lookups in the worst case. This is in contrast to most chaining and open addressing methods, where the time for lookup is low on average, but may be very large (proportional to the number of entries) for some sets of keys.*
Mark Peters
@Mark and @GaryF, thanks for clarifying - maybe I skimmed too quickly, I didn't get that argument reading it.
Stephen P
+1  A: 

Doing a totally non-clashing hash function is quite tricky even when you know exactly the space of things being hashed, and it's impossible in general. It also depends deeply on the size of the array that you're hashing into. That is, you need to know exactly what you're doing to make that work.

But if you instead relax that a bit so that identical hash codes don't imply equality1, then you can use the existing Java HashMap framework for all the other parts. All you need to do is to plug in your own hashCode() implementation in your key class, which is something that Java has always supported. And make sure that you've got equality defined right too. At that point, you've got the various operations being not much more expensive than O(1), especially if you've got a good initial estimation for the capacity and load factor.

1 Equality must imply equal hash codes, of course.

Donal Fellows
it isn't "not much more expensive than O(1)", it is exactly O(1). But +1 because everything else is correct.
Larry Watanabe
@Larry: That depends on your hashCode implementation, really. You will not get O(1) efficiency out of a hashCode that is defined as `return 42;`. But in the OP's case apparently it's well distributed so not an issue here.
Mark Peters
@Larry: It's not exactly O(1) but the difference is really complicated and the net effect is pretty similar. With a good hash function and a not too heavily loaded hash, it's good enough.
Donal Fellows
@Donal: I think hashing is exactly O(1)
Larry Watanabe
@Larry: It's actually O(k) where k is the key length (for short keys, no big deal, for long keys, important) plus a complicated term that depends on the exact nature of the hash table and which takes into account collisions. Basically, rebuilding the hash table to be bigger is a win once collisions become frequent enough, but it is quite a costly operation.
Donal Fellows
Average case for hashing is O(1), worst case is O(k), both amortized for an online hash table. You don't have to rebuild your array very often, but when you do it's a doozy.
Joel
A: 

It's simple. Just use a hash map. You don't need to do anything special. Hashmap itself is O(1) for insertion, deletion, calculating number of elements.

Even if the keys are not unique, the algorithm will still be O(1) as long as the Hashmap is automatically expanded in size if the collection gets too large (most implementations will do this for you automatically).

So, just use the Hash map according to the given documentation, and all will be well. Don't think up anything more complicated, it will just be a waste of time.

Avoiding collisions is really impossible with a hash .. if it was possible, then it would basically just be an array or a mapping to an array, not a hash. But it isn't necessary to avoid collisions, it will still be O(1) with collisions.

Larry Watanabe
It isn't necessary to avoid collisions? Everything collides and you degenerate to `O(N)` searches. Making hash tables collision-resistant is definitely important.
Justin Ardini
With respect to automatically increasing the size, I could be wrong but isn't that only O(1) using amortized analysis?Also, if you are using linear chaining as your collision system and you do not have a unique hash function, how can you claim it's still O(1) to remove a specific element? Your buckets would essentially devolve into linked lists. Unless I'm missing something about the Java Hashmap implementation removing would not be O(1) without a 100% unique hashing function.
JGord
-1. It has everything to do with the quality of your hash.
Mark Peters
unless you do something pretty dumb, like use a mod function, it isn't hard to come up with a good hash. The defaults used in the libraries should be good enough@JGord - try looking up Aho's algorithms book - actually any algorithm book - they will clearly tell you that hashing is O(1).
Larry Watanabe
@Larry: It's not difficult to make a good hash, but it's also really simple to make a really bad one. I once saw an implementation of a point class (int x, int y) that internally stored a long using the high bits for x, low bits for y. They thought, "oh, why not use Long's implementation of hashCode?" Trouble is, when there is correspondence between the high bits and low bits it's a terrible hash. Hashing every point between 0 and 1000 for x and y produces a 1024 unique hashes out of 1000000 combinations...a collision rate of 99.9%
Mark Peters
So in short you sometimes need to have significant insight into your domain to make an effective hash.
Mark Peters
A: 

What functionality do you need that a linked list won't give you?

cskilbeck
Removing in O(1) time? -1.
Mark Peters
A linked list which stores a reference to the head can act as a stack with O(1) for push/pop/size. Add a reference to the tail and it can be a queue with O(1) for enqueue/dequeue/size. OP did not specify random access, just O(1) for add/remove/size.
Brian S
Alright, I see where you're coming from though I don't think that remove operation is what the OP was going for. I would have specifically said queue or stack in that case though. Unfortunately can't remove the d/v.
Mark Peters
+2  A: 

Adding, Removing and Size (provided it is tracked separately, using a simple counter) can be provided by a linked list. Unless you need to remove a specific item. You should be more specific about your requirements.

bowenl2
+1  A: 

Even if your hashcodes are unique this doesn't guarentee a collision free collection. This is because your hash map is not of an unlimited size. The hashcode has to be reduced to the number of buckets in your hash map and after this reduction you can still get collisions.

e.g. Say I have three objects A (hash: 2), B (hash: 18), C (hash: 66) All unique. Say you put them in a HashMap of with a capacity of 16 (the default). If they were mapped to a bucket with % 16 (actually is more complex that this) after reducing the hash codes we now have A (hash: 2 % 16 = 2), B (hash: 18 % 16 = 2), C (hash: 66 % 16 = 2)

HashMap is likely to be faster than Hashtable, unless you need thread safety. (In which case I suggest you use CopncurrentHashMap) IMHO, Hashtable has been a legacy collection for 12 years, and I would suggest you only use it if you have to.

Peter Lawrey