tags:

views:

186

answers:

8

Hi. Say I have a Hashtable<String, Object> with such keys and values:

apple => 1
orange => 2
mossberg => 3

I can use the standard get method to get 1 by "apple", but what I want is getting the same value (or a list of values) by a part of the key, for example "ppl". Of course it may yield several results, in this case I want to be able to process each key-value pair. So basically similar to the LIKE '%ppl%' SQL statement, but I don't want to use a (in-memory) database just because I don't want to add unnecessary complexity. What would you recommend?

Update: Storing data in a Hashtable isn't a requirement. I'm seeking for a kind of a general approach to solve this.

+3  A: 

Not without iterating through explicitly. Hashtable is designed to go (exact) key->value in O(1), nothing more, nothing less. If you will be doing query operations with large amounts of data, I recommend you do consider a database. You can use an embedded system like SQLite (see SQLiteJDBC) so no separate process or installation is required. You then have the option of database indexes.

I know of no standard Java collection that can do this type of operation efficiently.

Matthew Flaschen
A: 

Can't be done in a single operation

You may want to try to iterate the keys and use the ones that contain your desired string.

arclight
+5  A: 

The obvious brute-force approach would be to iterate through the keys in the map and match them against the char sequence. That could be fine for a small map, but of course it does not scale.

This could be improved by using a second map to cache search results. Whenever you collect a list of keys matching a given char sequence, you can store these in the second map so that next time the lookup is fast. Of course, if the original map is changed often, it may get complicated to update the cache. As always with caches, it works best if the map is read much more often than changed.

Alternatively, if you know the possible char sequences in advance, you could pre-generate the lists of matching strings and pre-fill your cache map.

Update: Hashtable is not recommended anyway - it is synchronized, thus much slower than it should be. You are better off using HashMap if no concurrency is involved, or ConcurrentHashMap otherwise. Latter outperforms a Hashtable by far.

Apart from that, out of the top of my head I can't think of a better collection to this task than maps. Of course, you may experiment with different map implementations, to find the one which suits best your specific circumstances and usage patterns. In general, it would thus be

Map<String, Object> fruits;
Map<String, List<String>> matchingKeys;
Péter Török
Please check the question update
htf
@htf, checked and answered :-)
Péter Török
Would the downvoter care to explain why?
Péter Török
Sure - I took issue with the design approach of using caching to speed up an implementation based on an inappropriate data structure, when even the speed up is not guaranteed when the data is accessed randomly. It's always better to use a data structure designed to solve the problem than to hammer a square peg into a round hole and then paint over it.I canceled my downvote because I don't have a better solution to offer other than some research directions (I'd look at tries and suffix trees, and I proposed a solution using a trie but without pointers to a fully applicable implementation.)
Ori Pessach
Uhm, what's a trie?
seanizer
It's a data structure: http://en.wikipedia.org/wiki/Trie
Ori Pessach
@Ori, although I don't fully agree with your reason to downvote, you played a fair game :-) Thanks for the explanation, it's always interesting to learn something new.
Péter Török
A: 

The only solution I can see (I'm not Java expert) is to iterate over the keys and check for matching against a regular expression. If it matches, you put the matched key-value pair in the hashtable that will be returned.

ShinTakezou
+1  A: 

Sounds like you need a trie with references to your data. A trie stores strings and lets you search for strings by prefix. I don't know the Java standard library too well and I have no idea whether it provides an implementation, but one is available here:

http://www.cs.duke.edu/~ola/courses/cps108/fall96/joggle/trie/Trie.java

Unfortunately, a trie only lets you search by prefixes. You can work around this by storing every possible suffix of each of your keys:

For 'apple', you'd store the strings

'apple' 'pple' 'ple' 'le' 'e'

Which would allow you to search for every prefix of every suffix of your keys.

Admittedly, this is the kind of "solution" that would prompt me to continue looking for other options.

Ori Pessach
+1  A: 

first of all, use hashmap, not hashtable.

Then, you can filter the map using a predicate by using utilities in google guava

public Collection<Object> getValues(){
    Map<String,Object> filtered = Maps.filterKeys(map,new Predicate<String>(){
        //predicate methods
    });
    return filtered.values();
}
seanizer
A: 

If you can somehow reduce the problem to searching by prefix, you might find a NavigableMap helpful.

Justin K
I guess with any of the possible solutions, searching by prefix would drastically reduce the number of operations that needs to be performed.
seanizer
A: 

it will be interesting to you to look throw these question: http://stackoverflow.com/questions/327513/fuzzy-string-search-in-java

Also take a look on Lucene (answer number two)

dart