views:

54

answers:

2

I have the following model and instance:

class Bashable(models.Model):
    name = models.CharField(max_length=100)

>>> foo = Bashable.objects.create(name=u"piñata")

Now I want to be able to search for objects, but using ascii characters rather than unicode, something like this:

>>> Bashable.objects.filter(name__lookslike="pinata")

Is there a way in Django to do this sort of approximate string matching, using ascii stand-ins for the unicode characters in the database?

Here is a related question, but for Apple's Core Data.

+1  A: 

The first answer to this question shows how to use the strip_accents function which is a python function to achieve what you want. It's not technically part of Django but it is built into Python.

Pace
+1  A: 

Try searching against a "de-accented" list of names if the initial search fails. Here's a php remove_accents function that could be translated into python easily: remove_accents()

query = u"pinata"
r = Bashable.objects.filter(name=query)
if not r:
    accented = Bashable.objects.values('id', 'name')
    match_ids = [ obj['id'] for obj in accented 
                                if query in remove_accents(obj['name']) ]
    r = Blog.objects.in_bulk(match_ids)
return r

And here's a stackoverflow entry on fuzzy string matching in python: #682367

ariddell
This looks like a reasonable solution. It would be nice to be able to do this more efficiently at the database level in a single query, but it looks as though PostgreSQL at least doesn't support it (haven't checked the others). Pace's solution[1] points to the implementation of accent stripping. [1] http://stackoverflow.com/questions/2480159/django-approximate-matching-of-unicode-strings-with-ascii-equivalents/2480313#2480313