views:

636

answers:

3

I'm looking for a data structure to store strings in. I require a function in the interface that takes a string as its only parameter and returns a reference/iterator/pointer/handle that can be used to retrieve the string for the rest of the lifetime of the data structure. Set membership, entry deletion etc. is not required.

I'm more concerned with memory usage than speed.

A: 

I think the best bet here would be an ArrayList. The common implementations have some overhead from allocating extra space in the array for new elements, but if memory is such a requirement, you can allocate manually for each new element. It will be slower, but will only use the necessary memory for the string.

Gonzalo Quero
+1  A: 

I think the keyword here is string interning, where you store only one copy of each distinct string. In Java, this is accomplished by String.intern():

String ref1 = "hello world".intern();
String ref2 = "HELLO WORLD".toLowerCase().intern();
assert ref1 == ref2;
Zach Scrivena
+3  A: 
Avi
Depending on the language used and the distribution of the input, this may be inefficient: there will be a lot of pointers for single characters. Suffix tries are O(n) size but BIG. In java, each node will have a mutex and cond: REALLY BIG. Make sure to measure actual size before deciding.
Jonas Kölker
@Jonas: Absolutely good advice. Note that we weren't discussing Suffix tries in particular, rather the trie as a general string storage/lookup data structure. A variant of the trie which deals somewhat with memory usage is the Patricia Trie - which avoids single-character nodes as much as possible.
Avi