string-matching

C#: How to Delete the matching substring between 2 strings?

If I have two strings .. say string1="Hello Dear c'Lint" and string2="Dear" .. I want to Compare the strings first and delete the matching substring .. the result of the above string pairs is: "Hello  c'Lint" (i.e, two spaces between "Hello" and "c'Lint") for simplicity, we'll assume that string2 will be the sub-set o...

Splitting a String into Tokens and Storing the Delimiters in Perl

I have a string like this: a b c d I process my string like this: chomp $line; my @tokens = split /\s+/, $line; my @new_tokens; foreach my $token (@tokens) { push @new_tokens, some_complex_function( $token ); } my $new_str = join ' ', @tokens; I'd like to re-join the string with the origi...

How to search an array key by matching a string in it's value

Hi, I'm trying to find the key number in a array matching a string. I tried array_search in this way $key = array_search("foo", $array); echo $array[$key]; but that prints $array[0] Is there another way to do this? Thanks :) ...

Performance wise String Matching

I've a generic DB query function that runs the following checks every time an SQL query is issued: if (preg_match('~^(?:UPDATE|DELETE)~i', $query) === 1) if (preg_match('~^(?:UPDATE|DELETE)~iS', $query) === 1) if ((stripos($query, 'UPDATE') === 0) || (stripos($query, 'DELETE') === 0)) I know that a simple strpos() call is way faster ...

String matching in Python

does anyone know which string matching algorithm is implemented in Python? ...

Javascript equivalent to C strncmp (compare string for length)

Is there an equivalent in Javascript to the C function strncmp? Strncmp takes two string arguments and an integer length argument. It would compare the two string up to length and determine if they were equal as far as length went. Does javascript have an equivalent built in function? ...

More string matching features

Is it possible to create a regex that matches all strings with five a's and five b's? Like aaaaabbbbb or ababababab or aabbaabbab. I imagine it would require polynomial time for a deterministic engine. Are there other matching languages which would enable such matching? Update: I wanted to use the kind of expression for searching, s...

Help constructing regex

Hi, I need to know if a string matches a number of different criterias. I'm trying to solve this by using a regular expression and then see if it matches (in Java: str.matches(myRegex);), but I can't get it right. The criterias are as follows: The string to match is constructed of 4 letters, [A-Z] It may be preceeded (but not necessa...

string matching algorithms used by lucene

i want to know the string matching algorithms used by Apache Lucene. i have been going through the index file format used by lucene given here. it seems that lucene stores all words occurring in the text as is with their frequency of occurrence in each document. but as far as i know that for efficient string matching it would need to pre...

How to hierarchically (levelize) arrange list of file names with matching pre-fixes (LCS) defining the hierarchy - preferably using shell tools

Source code dirs have meaningful file names. for example AAAbbbCCddEE.h/.cxx : where AAA, bb CC could refer to abbrev of sub-systems or just a functionality-description like "...Print..." or "...Check..." as the code-base grows we land up with more than handful files per dir. it becomes daunting just to know what is doing what especial...

Return positions of a regex match() in Javascript?

Is there a way to retrieve the (starting) character positions inside a string of the results of a regex match() in Javascript? ...

First-Occurrence Parallel String Matching Algorithm

To be up front, this is homework. That being said, it's extremely open ended and we've had almost zero guidance as to how to even begin thinking about this problem (or parallel algorithms in general). I'd like pointers in the right direction and not a full solution. Any reading that could help would be excellent as well. I'm working on ...

Text similarity function for strict document similarity

I'm writing a piece of java software that has to make the final judgement on the similarity of two documents encoded in UTF-8. The two documents are very likely to be the same, or slightly different from each other, because they have many features in common like date, location, creator, etc., but their text is what decides if they reall...

Search for string allowing for one mismatches in any location of the string, Python

I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasite) I am not sure how large the genome is but much more that 230,000 sequences. I need to look for each of my sequences of 25 characters example(AGCCTCCCATGATTGAACAG...

READING stderr from within Awk

I want to keep SSH debug info separate (and logged) from other input. However, if I simply redirect stderr to a log file, I risk combining output from SSH and output from the remote process on the host machine (that might send something to stderr): $ ssh -v somemachine 2> file.log So, I want to filter out only those lines that match "...

Which substring of the string1 matches with the string2.

There are two strings. String str1="Order Number Order Time Trade Number"; String str2="Order Tm"; Then I want to know that str2 matches with which substring in the str1. string regex = Regex.Escape(str2.Replace(@"\ ", @"\s*"); bool isColumnNameMatched = Regex.IsMatch(str1, regex, RegexOptions.IgnoreCase); I am using regex because...

php selecting hash using wildcards

Say I have a hashmap, $hash = array('fox' => 'some value', 'fort' => 'some value 2', 'fork' => 'some value again); I am trying to accomplish an autocomplete feature. When the user types 'fo', I would like to retrieve, via ajax, the 3 keys from $hash. When the user types 'for', I would like to only retrieve...

Determining whether values can potentially match a regular expression, given more input

I am currently writing an application in JavaScript where I'm matching input to regular expressions, but I also need to find a way how to match strings to parts of the regular expressions. For example: var invalid = "x", potentially = "g", valid = "ggg", gReg = /^ggg$/; gReg.test(invalid); //returns false (correct) gReg.t...

Approximate string matching with a letter confusion matrix?

I'm trying to model a phonetic recognizer that has to isolate instances of words (strings of phones) out of a long stream of phones that doesn't have gaps between each word. The stream of phones may have been poorly recognized, with letter substitutions/insertions/deletions, so I will have to do approximate string matching. However, I ...

Iterating through String word at a time in Python

I have a string buffer of a huge text file. I have to search a given words/phrases in the string buffer. Whats the efficient way to do it ? I tried using re module matches. But As i have a huge text corpus that i have to search through. This is taking large amount of time. Given a Dictionary of words and Phrases. I iterate through th...