views:

194

answers:

5

I have a hash table where the keys are rather complex lists, with sublists of symbols and integers, and the value should be modified depending on the already existing value. The table is created with :test #'equal.

I do something similar to this a lot:

(defun try-add (i)
  (let ((old-i (gethash complex-list table nil)))
    (if (may-add old-i)
      (push i (gethash complex-list table)))))

Profiling shows that equal tests take a long of time. I have an optimization idea, that the amount of gethash lookups could be reduced from two to one. It can be done in C++ by reusing the iterator, but not sure how this would be done in Lisp. Any ideas?

A: 

Some workarounds might be:

If the common pattern is lookup -> find-it -> overwrite-it, then you could replace the value type to a list that contains the value type. Then after finding the value object for the key, just destructively replace its first element, e.g.

(defun try-add (i)
  (let ((old-i-list (gethash complex-list table nil)))
    (if (may-add (first old-i-list))
      (setf (first old-i-list) i)                     ; overwrite without searching again
      (setf (gethash complex-list table) (list i))))) ; not there? too bad, we have to gethash again

Alternatively, if the common pattern is more like lookup -> it's-not-there -> add-it, you might want to consider hashing the keys on your own, and then have the hash table use your hashed value as the key. This might be more complicated, depending on the depth and semantics of these complex lists. In the simple case, you might get away with a hash function that (recursively) xor's the hash value of the elements of its list argument.


EDITED: answering the question in the comments: the idea is that instead of the hash table mapping keys to values, the hash table will now map keys to single element lists, where the element is the value. Then you can change the contents of these lists without touching the hash table itself. The following is from SBCL:

* (defparameter *my-hash* (make-hash-table))
*MY-HASH*

* (setf (gethash :my-key *my-hash*) (list "old-value"))
("old-value")

* (gethash :my-key *my-hash*)
("old-value")
T

* (defparameter old-value-container (gethash :my-key *my-hash*))
OLD-VALUE-CONTAINER

* (setf (first old-value-container) "new value")
"new value"

* (gethash :my-key *my-hash*)
("new value")
T
Oren Trutner
I tried something similar to the source code you posted, but when doing the (setf (first old-i-list)...), it only alters old-i-list and the change is not reflected in the hash table value. Am I misunderstanding something fundamental?
kotlinski
khedron
"Common lisp's built in data structures are notoriously opaque." -- what do you mean?
skypher
The discussion at http://www.reddit.com/r/programming/comments/7sab5/problems_with_lisp_hash_tables/ demonstrates some of the criticisms. This is not an attack on the language, but rather an observation on how some of its built in construct are architected. I'm removing the notoriety note though -- it's not adding value to this answer.
Oren Trutner
A: 

One thing you could do is use defstruct to create a value which every entry in your hash table points to. Your list of values (that you're pushing onto in your current example) could be stored inside there. The struct creation could either be done in that initial gethash call (as the default value), or could be done manually if you observe there's no value there. Then, the object can be side-effected in the way that you're doing.

(This ignores the question of whether or not you really want to be using such complex values as your hashtable keys, or if there's a way to work around that. For example, you could be using structures/CLOS objects instead of complex lists as your keys, and then you could use an EQ hashtable instead. But that depends a lot on what you're doing.)

khedron
A: 

"Profiling shows that equal tests take a long of time."

Yes, but have you verified that #'EQUAL hash table lookups do also take a lot of time?

Have you compiled this for speed on an optimizing compiler like SBCL and looked at the compiler notes?

After having resolved these two questions you could also try a nested hash table for each level of your list keys. It should not be hard to write a macro for arbitrarily nested hash tables.

skypher
+3  A: 

Don't do anything special, because the implementation does it for you.

Of course, this approach is implementation-specific, and hash table performance varies between implementations. (But then optimization questions are always implementation-specific.)

The following answer is for SBCL. I recommend checking whether your Lisp's hash tables perform the same optimization. Complain to your vendor if they don't!

What happens in SBCL is that the hash table caches the last table index accessed by GETHASH.

When PUTHASH (or equivalently, (SETF GETHASH)) is called, it first checks whether the key at that cached index is EQ to the key that you are passing in.

If so, the entire hash table lookup routine is by-passed, and PUTHASH stores directly at the cached index.

Note that EQ is just a pointer comparison and hence extremely fast -- it does not have to traverse the list at all.

So in your code example, the is no overhead at all.

David Lichteblau
Cool - thanks :)
kotlinski
A: 

Maybe I'm missing something obvious, but:

(defun try-add (i)
  (let ((old-i (gethash complex-list table)))
    (when (may-add old-i)
      (push i old-i))))

since:

  • nil is already the default for GETHASH
  • GETHASH pulls out the whole object so you can just modify it in-place rather than telling PUSH how to look it up again
  • (style point: use WHEN instead of IF when there's no else-clause)

Edit: oops, I was: I missed the case where old-i is nil. But if that's not the common case, then it still might be a win, since you only need to do the lookup in that case:

(defun try-add (i)
  (let ((old-i (gethash complex-list table)))
    (when (may-add old-i)
      (if old-i
         (push i old-i)
        (push i (gethash complex-list table))))))

Hmm, does that work?

Ken