string-matching

Alternatives to Lucene Default Fuzzy Matching Implementation

Lucene fuzzy matching uses a basic editDistance algorithm to implement fuzzy matching. Are there other implementations of fuzzy matching for Lucene which use other similarity metrics? They should identify homphones also. Also please compare various fuzzy matching approaches for lucene. ...

Optimizing near-duplicate value search

I'm trying to find near duplicate values in a set of fields in order to allow an administrator to clean them up. There are two criteria that I am matching on One string is wholly contained within the other, and is at least 1/4 of its length The strings have an edit distance less than 5% of the total length of the two strings The Pse...

Is there a faster method to match an arbitrary String to month name in Java

Hello, I want to determine if a string is the name of a month and I want to do it relatively quickly. The function that is currently stuck in my brain is something like: boolean isaMonth( String str ) { String[] months = DateFormatSymbols.getInstance().getMonths(); String[] shortMonths = DateFormatSymbols.getInstance().getShortM...

Java: JPQL search -similar- strings

What methods are there to get JPQL to match similar strings? By similar I mean: Contains: search string is found within the string of the matches entity Case-insensitive Small mispellings: e.g. "arow" matches "arrow" I suspect the first two will be easy, however, I would appreciate help with the last one Thank you ...

Aligning music notes using String matching algorithms or Dynamic Programming

Hi I need to compare 2 sets of musical pieces (i.e. a playing-taken in MIDI format-note details extracted and saved in a database table, against sheet music-taken into XML format). When evaluating playing against sheet music (i.e.note details-pitch, duration, rhythm), note alignment needs to be done - to identify missed/extra/incorrect/...

sample java code for approximate string matching or boyer-moore extended for approximate string matching

Hi I need to find 1.mismatch(incorrectly played notes), 2.insertion(additional played), & 3.deletion (missed notes), in a music piece (e.g. note pitches [string values] stored in a table) against a reference music piece. This is either possible through exact string matching algorithms or dynamic programming/ approximate string matching...

Would there be any advantage in comparing pattern and text characters right-to-left instead of left-to-right?

Hi all, This is the exercise in "Introduction to The Design and Analysis of Algorithms". It's a string matching issue. Say I have string ABCD, and have a pattern XY. And want to see if the string contains the pattern. We just assume to use brute-force here, so the left-to-right comparison is comparing A with X, next is comparing B with...

What is a regular expression that find a line like this: <rect **** />

Hi, I want a regular expression that could be used to find the following lines: <rect width='10px' height ='20px'/> <rect width='20px' height ='22px'/> <circle radius='20px' height ='22px'/> and replace them by the these lines: <rect width='10px' height ='20px'></rect> <rect width='20px' height ='22px'></rect> <circle radius='20px' h...

Fuzzy data matching for personal demographic information

Lets say I have a database filled with people with the following data elements: PersonID (meaningless surrogate autonumber) FirstName MiddleInitial LastName NameSuffix DateOfBirth AlternateID (like an SSN, Militarty ID, etc) I get lots of data feeds in from all kinds of formats with every reasonable variation on these pieces of infor...

String searching algorithms in Java

Hi all! I am doing string matching with big amount of data. EDIT: I am matching words contained in a big list with some ontology text files. I take each file from ontology, and search for a match between the third String of each file line and any word from the list. I made a mistake in overseeing the fact that what I need to do is no...

Difference between performance of the two sql queries?

I have a field in my table having text data type. Is there a difference in performance for the following two sql queries: select * from tablename where fieldname="xyz%"; select * from tablename where fieldname="%zyx"; If we were to implement the execution of these queries, this is what I think we would need to do: We have to match...

efficient algorithm for searching one of several strings in a text?

I need to search incoming not-very-long pieces of text for occurrences of given strings. The strings are constant for the whole session and are not many (~10). Additional simplification is that none of the strings is contained in any other. I am currently using boost regex matching with str1 | str2 | .... The performance of this task is...

Why is my code stopping?

Hey I've encountered an issue where my program stops iterating through the file at the 57802 record for some reason I cannot figure out. I put a heartbeat section in so I would be able to see which line it is on and it helped but now I am stuck as to why it stops here. I thought it was a memory issue but I just ran it on my 6GB memory ...

Building a suffix tree for a string matching algorithm in large database.

Hi friends, I had an internship interview last week and I was given a question regarding searching for a particular string in a large database. I was totally clueless about it during the interview though I just gave a reply the"multi-level hashing" as that was the only hin I knew which had the best time efficiency, After a bit googling I...

Accurate algorithm for normalizing taxonomy terms?

Hello, I'm developing a shopping comparison website, and the project is in a very advanced stage. We index 50 million products daily using merchant feeds from various affiliate networks. Most of the problems I had is already solved, including the majority of the performance bottlenecks. What is my problem: Please, first of all, we are ...

F# Matching mutable object (string)

Here is the full code so far: module clean #light open System open System.IO let pause() = Console.ReadLine() let drive = System.IO.Directory.GetDirectoryRoot(System.IO.Directory.GetCurrentDirectory()) printfn "You're using the %s drive.\n\n" drive let game1 = "Assassin's Creed" let game2 = "Crysis" let game3 = "Mass Effect" let local...

I need to implement an Auto complete Utility using Struts2-JQuery plugin.

There is an inbuilt tag for this purpose. User enters a character in the textbox, Strings which start with the character entered should be displayed in the form of a list. The item selected from the list should be populated in the textbox. P.S: The examples and demo available display Strings that contain the character entered. But I w...

Using Aho-Corasick on a DAWG rather than a Trie

Hi fellas, does anybody know if it's possible to modify the Aho-Corasick string matching algorithm to be used on a DAWG (Directed Acyclic Word Graph) rather than a Trie? ...

Regex to match 'lol' to 'lolllll' and 'omg' to 'omggg', etc..

Hey there, I love regular expressions, but I'm just not good at them at all. I have a list of some 400 shortened words such as lol, omg, lmao...etc. Whenever someone types one of these shortened words, it is replaced with its English counterpart ([laughter], or something to that effect). Anyway, people are annoying and type these shor...

is there a way find substring index through regual expression in python?

i want to find a substring 's index postion,but the substring is long and hard to expression(multiline,& even you need escape for it ) so i want to use regex to match them,and return the substring's index, the function like str.find or str.rfind , is there some package help for this? ...