From the set of input strings you need to determine a set of N-grams, then create a multimap where the key is the N-gram, and the value is the string containing that N-gram.
E.g. suppose you have the string "Hello World", and you use the value 3 for N, then your N-grams are:
- Hel
- ell
- llo
- lo
- o W
- Wo
- Wor
- orl
- rld
Depending on the size and characteristics of your input strings, determine a suitable value for N.
Now if you want to look for strings containing Hello, split your search string also in N-grams, and look up the N-grams in the earlier described multimap.
Depending on what you find in the multimap, you could:
- take the intersection of all the results (mainly if all N-grams return a rather large result set)
- abort the search if an N-gram is not found in the multimap (string is not found)
- stop the search if there is only result
Finally, check all the found strings to see if they really contain the string. After all, the N-grams are just a trick to find the strings, but there is no guarantee that they contain the string you are looking for.
E.g. the string "Helllllo also contains the 3-grams ofr Hello, but does not contain Hello itself.