tags:

views:

72

answers:

6

Many search engine have the 'did you mean' functionality.

Is there a simple way to use (N)Hibernate (e.g. ICriteria) to find an entity (e.g. keyword) based on similarity. Please note that I do not mean Expression.Like or something like this.

I hope this question makes sense.

Thanks.

Christian

PS:

similarity means in my case (let us say) 70% of characters in common.

I envisaged to implement an extension method called bla which I can use for my criteria queries:

ICriteria Criteria = Session.CreateCriteria(typeof(xxx)); Criteria.Add(Expression.bla("name ", name)); return Criteria.List() as List;

+2  A: 

It's out of scope for nHibenate. nHibernate is a data access layer, it can only do things that the database does. You would have to determine similarities yourself, perhaps by maintaining a table of common mistypes. That's what search engines do anyway, they don't just magically determine what's a typo.

HeavyWave
> That's what search engines do anyway, they don't just magically determine what's a typo.Not sure whether they store this information in a db - i rather think they use an algorithm - like characters in common or something
csetzkorn
Both. They have to know for certain that the typo is not a name of some sort. But you might have different criterias for "similarity".
HeavyWave
Maintaining a table of common mistypes is a really BAD idea! Don't do this!
Pavel Nikolov
That's what I thought too Pavel.
csetzkorn
+1  A: 

Hibernate won't make your database any smarter than it already is. "Did you mean" is a very tricky business; it is generally implemented by doing statistical analysis of words and n-grams (multi-word sequences) against the metadata of the search engine's inverted-file index structures and query logs.

As an exmaple, if I type exmaple code, the engine might do a scan of the most common known words in the corpus, computing each word's edit distance from the term exmaple. It will probably find example and thus suggest, "Did you mean example code".

Marcelo Cantos
A: 

Similarity is hard to define and IMHO is defined differently in many use cases. Similarity can be phonetically (there are different algorithms like Köllner Verfahren for Germany). In case of phonetically similarity it's a function that calculates the string representation. Then one could use the Levenshtein distance to compare them. I don't know much about (N)Hibernate, but an extension method could be used to calculate the comparison on object base.

-sa

Sascha
Thanks - extension method - that's what I was looking for. what I envisaged was that I implement an extension method called bla which I can use for my criteria queries:ICriteria Criteria = Session.CreateCriteria(typeof(xxx)); Criteria.Add(Expression.bla("name ", name)); return Criteria.List<xxx>() as List<xxx>;
csetzkorn
A: 

I don't think NHibernate has a functionality which inherently provides you the similar words.

You have to create a distance function which calculates whats the distance between words (how similar they are) and based on a threshold value you can consider all the words that has distance values below that value with respect to your original word.

This distance function is the key, and you can have many criteria based on which you calculate the distance between words

Mahesh Velaga
Hi,Can I somehow implement this using ICriteria. I am sure I could implement an interface or something to count the number of characters in common which could then be used somehow. c# (3.5) has a special name for the methods which can than appear if you use Expression.Bla. Hope this makes sense.I understand that NHIbernate is a data access technology but I also saw somewhere that it integrates with Lucene - there is actually a book on search with hibernate i think.
csetzkorn
+1  A: 

You can use the SOUNDEX function in SQL

SELECT
    * 
FROM
    Products
WHERE
    SOUNDEX(ProductName) = SOUNDEX('beer')

This will return products which have names similar to "beer".

UPDATE:

SELECT
    * 
FROM
    Products
WHERE
    DIFFERENCE(ProductName, 'beer') IN (3, 4)

This would also return products with similar names...

-Pavel

Pavel Nikolov
Interesting - just tried it and the result look ok.Is this an ansi standard UDF?
csetzkorn
For the SOUNDEX: Shortcoming for all phonetic algorithms is, that they focus on a specific language (soundex english). The Köllner Verfahren was developed for Germany, etc. Second: Similar doesn't mean exactly the same function output, so probably you must have a distance for the soundex result, too.
Sascha
+2  A: 

As others said, it's generally out of scope for a RDBMS. Use Lucene.Net (possibly via NHibenate.Search) or Solr (possibly via SolrNet) instead. Solr even comes with spell checking out of the box which you can use to easily implement "did you mean" functionality.

Mauricio Scheffer