string-search

How to find best fuzzy match for a string in a large string database

I have a database of strings (arbitrary length) which holds more than one million items (potentially more). I need to compare a user-provided string against the whole database and retrieve an identical string if it exists or otherwise return the closest fuzzy match(es) (60% similarity or better). The search time should ideally be under ...

Search for selection in vim

I use vim and vim plugins for visual studio when writing C++. Often, I find myself wanting to search for a string within a function, for example every call to object->public_member.memberfunc(). I know vim offers a convenient way to search for a single word, by pressing * and #, and it can also search for typed strings using the ubiquito...

Computing the second (mis-match) table in the Boyer-Moore String Search Algorithm

For the Boyer-Moore algorithm to be worst-case linear, the computation of the mis-match table must be O(m). However, a naive implementation would loop through all suffixs O(m) and all positions in that that suffix could go and check for equality... which is O(m3)! Below is the naive implementation of table building algorithm. So this qu...

How to match the word exactly with regex?

Hi, I might be asking this question incorrectly but what I would like to do is the following: Given a large String which could be many 100s of lines long match and replace a word exactly and make sure it does not replace and match any part of any other String. For example : Strings to Find = Mac Apple Microsoft Matt Damon I.B.M. Hur...

String searching algorithm for Chinese characters.

There are Python code available for existing algorithms for normal string searching e.g. Boyer-Moore Algorithm. I am looking to use this on Chinese characters and it doesn't seem like the same implementation would work. What would I go about doing in order to make the algorithm work on Chinese characters? I am referring to this: http://...

"tailing" a binary file based on string location using bash?

I've got a bunch of binary files, each containing an embedded string near the end of the file but at different places (only occurs once in each file). I need to extract the part of the file starting at the location of the string till the end of the file and dump it into a new file. eg. If the file's contents is "AWREDEDEDEXXXERESSDSDS...

String searching algorithms

For the two string searching algorithms: KMP and suffix tree, which is preferred in which cases? Give some practical examples. ...

String Occurance Counting Algorithm

Hello, I am curious what is the most efficient algorithm (or commonly used) to count the number of occurances of a string in a chunck of text. From what I read, Boyer–Moore string search algorithm is the standard for string search but I am not sure if counting occurance in an efficient way would be same as searching a string. In python...

Optimizing a lot of Scanner.findWithinHorizon(pattern, 0) calls

I'm building a process which extracts data from 6 csv-style files and two poorly laid out .txt reports and builds output CSVs, and I'm fully aware that there's going to be some overhead searching through all that whitespace thousands of times, but I never anticipated converting about 50,000 records would take 12 hours. Excerpt of my ma...

Partial string search in boost::multi_index_container

I have a struct to store info about persons and multi_index_contaider to store such objects. Mult-index uses for search by different criteria. I've added several persons into container and want to find person by lastname. It works great, if I use whole lastname. But it returns nothig if I try to find person by a part of a lastname (firs...

Way to implementing Search functinality on a Window

I am working on a (WPF + C#) application. I have to implement search functionality. It will allow to search all the occurrences of a particular string on the specific part of Window. What can be the best way to do this? ...

String searching algorithms in Java

Hi all! I am doing string matching with big amount of data. EDIT: I am matching words contained in a big list with some ontology text files. I take each file from ontology, and search for a match between the third String of each file line and any word from the list. I made a mistake in overseeing the fact that what I need to do is no...

php - Is strpos the fastest way to search for a string in a large body of text?

if (strpos(htmlentities($storage->getMessage($i)),'chocolate')) Hi, I'm using gmail oauth access to find specific text strings in email addresses. Is there a way to find text instances quicker and more efficiently than using strpos in the above code? Should I be using a hash technique? ...