I have written some Ruby code to import the Google n-gram data into a hash table, mapping word unigrams to their respective counts. I'm using symbols as opposed to strings for the keys. I've been running this code on a linux box for a while now with no problems. Running it on my Mac this morning yielded a symbol table overflow runtime error after loading about 2 million key-value pairs. I don't understand what is causing this error. Anyone have suggestions on what might be the cause? I'm running Ruby 1.9.1 under OS X 10.5.8.
+1
A:
Is the difference 64-bit bs. 32-bit ruby? I suspect this because of your observation
yielded a symbol table overflow runtime error after loading about 2 million key-value pairs
If this is the case then you can do nothing about it but using a native 64-bit build of ruby if strings are not an option due to application design. Otherwise you'll have to go with strings. Conversion is easy:
:symbol.to_s == "symbol"
"symbol".to_sym == :symbol
hurikhan77
2010-02-10 19:23:03
or use strings!
Peter
2010-02-10 19:23:47
Think you hit the problem dead on! Thanks!
Chris
2010-02-10 20:10:11
+2
A:
While using Symbol for keys instead of String is generally more efficient, the amount of efficiency gained is proportionate to the level of duplication involved. Since your keys are by definition unique, you should probably just use String keys to avoid jamming the Symbol table full of entries.
tadman
2010-02-10 19:23:18
I'm assuming there is some savings on the lookup but as far as how much, it's not clear. So strings may very well be sufficient.
Chris
2010-02-10 20:11:54
On lookup there would only be a savings if the key you're trying to resolve has been encoded previously, and even then it's easy to argue it's less efficient to hash the string into symbol, then hash the symbol itself than to simply hash the string. The entire symbol space functions as a hash, after all.
tadman
2010-02-10 23:00:03