ansaurus

Question

Search for strings matching the pattern "abc:*:xyz" in less than O(n)

Answer 1

+2 A:

Wouldn't an intersection of the results from the other two queries give you exactly that? And since each of the results is O(log N) and an intersect over that result set is O(N) in the result-set size, wouldn't the total also be an O(log N) over the original problem?

jerryjvl 2009-06-05 08:07:30

Intersect being O(N) is based on the assumption that the two result sets you retrieve from the original trees are also sorted ;)

jerryjvl 2009-06-05 08:08:42

This is still O(n) in the worst case.

laalto 2009-06-05 08:10:13

No, that's still O(N) since the result set size is still bounded bei N.

balpha 2009-06-05 08:10:46

It may be more correct to say that the combined operation is of the same order as the 'prefix' and 'suffix' operations are. I assumed from the statement of O(log N) in his question that the typical result-set only has on the order of log N elements in it, otherwise he cannot do the prefix or suffix operations in that amount of time either.

jerryjvl 2009-06-05 08:42:34

Answer 2

A:

If you take the possibility of storing the strings in a search tree into account, why not also store the properties "starts with abc" and "ends with xyz", using these properties as keys?

Edit: You also might want to leave Big-O-Notation behind and rather concentrate on the actual expected search duration in your particular use case. That's probably a more realistic approach; O(f(n)) style grades for your algorithm / implementation probably don't give you much usable information when it comes to your real-life search efficiency.

balpha 2009-06-05 08:16:17

Yes you're right. I shouldn't have mentioned big-O. I was concerned with only the average-case efficiency not the worst-case.

Harish 2009-06-05 09:28:28

Answer 3

A:

If "abc" and "xyz" are fixed values, you can maintain three counters with your collection indicating the number of strings:

starting with "abc" but not ending with "xyz".
not starting with "abc" but ending with "xyz".
starting with "abc" and ending with "xyz".

That gives you an O(1) time complexity for searching at the cost of extra calculation when inserting into, or deleting from, the collection.

If the "abc" and "xyz" are arbitrary strings, it's O(n) for all operations, including the "abc..." one. You only have to consider what happens when your collections consists of items that all start with "abc" to see this. That's not bounded by O(logN) at all since you have to process all items in the tree (both branches of every non-leaf node).

I think your ideal solution is to maintain the two ordered trees, one for the normal strings and one for the reversed strings. But don't worry about trying to do an intersection between the two. All you need to do is minimize the search space as much as practicable.

To find "abc...", use the normal tree to find the strings starting with that value.
To find "...xyz", use the reverse tree to find the strings ending with the reverse of that that value (zyx...).
To find "abc...xyz", use the normal tree to find the strings starting with that value and then filter out those that don't end in "xyz".

That way you don't have to worry about intersecting values between the two trees and you still get a performance improvement over the simplistic linear search.

paxdiablo 2009-06-05 08:31:55

"abc" and "xyz" are arbitrary of course. Ok you're right about it not being O(log N). I should have stayed away from mentioning anything in Big-O. But in most cases you can definitely prune away certain branches of the tree.

Harish 2009-06-05 08:49:38

Answer 4

+1 A:

Generate rotations of each word and put each rotation in a suffix tree with indication of "rotation index'.

For example, to put the string "hello", put

hello, 0
elloh, 1
llohe, 2
lohel, 3
ohell, 4

Also, you put "hero" as

hero, 0
eroh, 1
rohe, 2
oher, 3

Also, you put "ohe" (don't ask me what's ohe)

ohe, 0
heo, 1
eoh, 2

Then, if you need to search for pattern "he*o", you need to rotate it until you get a prefixed string: "ohe*"

In the suffix tree you find the candidates: (ohell, 4), (oher, 3), (ohe, 0). Then you restore their original versions (by un-rotating them) and pick the right ones - "hello" and "hero".

Igor Krivokon 2009-06-05 09:07:23

Harish 2009-06-05 09:26:22

ansaurus

tags:

views:

answers:

Search for strings matching the pattern "abc:*:xyz" in less than O(n)

related questions