views:

490

answers:

2

I have a dictionary of 'n' words given and there are 'm' Queries to respond to. I want to output the number of words in dictionary which are edit distance 1 or 2. I want to optimize the result set given that n and m are roughly 3000.

Edit added from answer below:

I will try to word it differently.

Initially there are 'n' words given as a set of Dictionary words. Next 'm' words are given which are query words and for each query word, I need to find if the word already exists in Dictionary (Edit Distance '0') or the total count of words in dictionary which are at edit distance 1 or 2 from the dictionary words.

I hope the Question is now Clear.

Well, it times out if the Time Complexity is (m*n)n.The naive use of DP Edit Distance Algorithm times out. Even Calculating the Diagonal Elements of 2k+1 times out where k is the threshold here k=3 in above case.

+3  A: 
Simon Svensson
Another option would be to precalculate the set of words within edit distance 2 or each word in the dictionary, and store those.
Nick Johnson
@Nick Johnson, precalculating data presumes that you have a fixed search space and a fixed input. Any change and you wont be able to use precalculated values.
Simon Svensson
@Simon Svensson: I would say a fixed search space, what does input have to do with Nick's remark ?
Matthieu M.
A: 

I will try to word it differently.

Initially there are 'n' words given as a set of Dictionary words. Next 'm' words are given which are query words and for each query word, I need to find if the word already exists in Dictionary (Edit Distance '0') or the total count of words in dictionary which are at edit distance 1 or 2 from the dictionary words.

I hope the Question is now Clear.

Well, it times out if the Time Complexity is (m*n)n.The naive use of DP Edit Distance Algorithm times out. Even Calculating the Diagonal Elements of 2k+1 times out where k is the threshold here k=3 in above case.

PS: BK Tree should suffice the purpose.Any Links about Implementation in C++.

chini
I moved your clarification into your original question.
Simon Svensson