fuzzy

Are there any Fuzzy Search or String Similarity Functions libraries written for C#?

There are similar question, but not regarding C# libraries I can use in my source code. Thank you all for your help. I've already saw lucene, but I need something more easy to search for similar strings and without the overhead of the indexing part. The answer I marked has got two very easy algorithms, and one uses LINQ too, so it's p...

Identifying if 2 HTML pages are similar

I'm trying to identify differences between a base case and supplied case. Looking for a library to tell me similarity in percentage or something like that. For Example: I've 10 different HTML pages. * All of them are 404 responses with only one 2 lines of random code (such as time or quote of the day). Now when I supply a new 404 pag...

T-SQL fuzzy lookup without SSIS?

SSIS 2005/2008 does fuzzy lookups and groupings. Is there a feature that does the same in T-SQL? ...

fuzzy string search in Java

I'm looking for high performance Java library for fuzzy string search. There are numerous algorithms to find similar strings, Levenshtein distance, Daitch-Mokotoff Soundex, n-grams etc. What Java implemenations exists? Pros and cons for them? I'm aware of Lucene, any other solution or Lucene is best? I found these, anyone has experien...

How do I do a fuzzy match of company names in MYSQL with PHP for auto-complete?

My users will import through cut and paste a large string that will contain company names. I have an existing and growing MYSQL database of companies names, each with a unique company_id. I want to be able to parse through the string and assign to each of the user-inputed company names a fuzzy match. Right now, just doing a straight-...

Fuzzy matching of product names

I need to automatically match product names (cameras, laptops, tv-s etc) that come from different sources to a canonical name in the database. For example "Canon PowerShot a20IS", "NEW powershot A20 IS from Canon" and "Digital Camera Canon PS A20IS" should all match "Canon PowerShot A20 IS". I've worked with levenshtein distance with s...

A better similarity ranking algorithm for variable length strings

I'm looking for a string similarity algorithm that yields better results on variable length strings than the ones that are usually suggested (levenshtein distance, soundex, etc). For example, Given string A: "Robert", Then string B: "Amy Robertson" would be a better match than String C: "Richard" Also, preferably, this algorithm sh...

Determining if two or more summaries are similar

The problem is as follows: I have one summary, usually between 20 to 50 words, that I'd like to compare to other relatively similar summaries. The general category and the geographical location to which the summary refers to are already known. For instance, if people from the same area are writing about building a house, I'd like to be...

Overcoming the Bitap algorithm's search pattern length

I am new to the field of approximate string matching. I am exploring uses for the Bitap algorithm, but so far its limited pattern length has me troubled. I am working with Flash, and I dispose of 32 bit unsigned integers and a IEEE-754 double-precision floating-point Number type, which can devote up to 53 bites for integers. Still, I wo...

Fuzzy date algorithm

I'm looking for a fuzzy date algorithm. I just started writing one and realised what a tedious taks it is. It quickly degenerated into a lot of horrid code to cope with special cases like the difference between "yesterday", "last week" and "late last month" all of which can (in some cases) refer to the same day but are individually corre...

Fuzzy Date Time Picker Control in C# .NET ?

I am implementing a Fuzzy Date control in C# for a winforms application. The Fuzzy Date should be able to take fuzzy values like Last June 2 Hours ago 2 Months ago Last week Yesterday Last year and the like Are there any sample implementations of "Fuzzy" Date Time Pickers? Any ideas to implement such a control would be appreciat...

Fuzzy matching using T-SQL

I have a table Persons with personaldata and so on. There are lots of columns but the once of interest here are: addressindex, lastname and firstname where addressindex is a unique address drilled down to the door of the apartment. So if I have 'like below' two persons with the lastname and one the firstnames are the same they are most l...

Fuzzy date parsing with Java

Are there any libraries for Java that allow you to interpret dates like "Yesterday", "Next Monday", ... ...

Fuzzy Scheduling

I'm writing a windows service that needs to execute a task (that connects to a central server) every 30 days +- 5 days (it needs to be random). The service will be running on 2000+ client machines, so the randomness is meant to level them out so the server does not get overloaded. What would be the best way to do this? Currently, I pick...

django fuzzy string translation not showing up

1)why sometimes i got 'fuzzy' item in django.po language file . Actually i have checked in my project the 'fuzzy' string item is totally unique. #: .\users\views.py:81 .\users\views.py:101 #, fuzzy msgid "username or email" msgstr "9988" 2) It is ok to be fuzzy but my translation of fuzzy item not showing up on the page , only English...

How to spot and analyse similar patterns like Excel does?

You know the functionality in Excel when you type 3 rows with a certain pattern and drag the column all the way down Excel tries to continue the pattern for you. For example Type... test-1 test-2 test-3 Excel will continue it with: test-4 test-5 test-n... Same works for some other patterns such as dates and so on. I'm trying...

Performance of Python worth the cost?

I'm looking at implementing a fuzzy logic controller based on either PyFuzzy (Python) or FFLL (C++) libraries. I'd prefer to work with python but am unsure if the performance will be acceptable in the embedded environment it will work in (either ARM or embedded x86 proc both ~64Mbs of RAM). The main concern is that response times are ...

a simple/practical example of fuzzy c-means algorithm

i a writing my master thesis on the subject of dynamic keystroke authentication. to support ongoing research, i am writing code to test out different methods of feature extraction and feature matching. my current simple approach just checks if the reference password keycodes matches the currently typed in keycodes and also checks if the...

Fuzzy grouping in Postgres

I have a table with contents that look similar to this: id | title ------------ 1 | 5. foo 2 | 5.foo 3 | 5. foo* 4 | bar 5 | bar* 6 | baz 6 | BAZ …and so on. I would like to group by the titles and ignore the extra bits. I know Postgres can do this: SELECT * FROM ( SELECT regexp_replace(title, '[*.]+$', '') AS title FROM t...

Adding fuzziness to a lucene query

Is there a simple way to add a fuzziness level to a user entered search query in lucene, I'd like to avoid having to parse their entered text if possible. At present if they enter green boxes I use a multifield query parser with boosts which easily generates the following for example: +(title:green^10 title:boxes^10) +(category:green...