ansaurus

Question

Django: Slug in Vietnamese

Answer 1

+2 A:

See http://stackoverflow.com/questions/702337/how-to-make-django-slugify-work-properly-with-unicode-strings

Tordek 2009-10-22 04:55:06

I mean before we save to db, at which we add record to db in django-admin.

Tran Tuan Anh 2009-10-22 05:51:59

... What's wrong with calling `unicodedata.normalize()` before inserting into the database?

Ignacio Vazquez-Abrams 2010-01-15 15:29:17

Answer 2

+1 A:

@Anh: You should write a new filter or tag to do that.

Navaro 2010-01-15 14:47:41

Answer 3

+2 A:

[edit]

I take it back, django's django.template.defaultfilters.slugify() does what you want, using unicodedata.normalize and .encode('ascii', 'ignore'). Just feeding your string into slugify will work:

from django.template.defaultfilters import slugify
print slugify(u"những-viên-kẹo")

To do this automatically, add this to the .save() method in your models:

from django.template.defaultfilters import slugify
MyModel(models.Model):
    title = models.CharField(max_length=255)
    slug  = models.SlugField(blank=True)

    def save(self, *args, **kwargs):
        if not self.slug:
            self.slug = slugify(self.title)
        super(MyModel, self).save(*args, **kwargs)

The slolution I wrote ealier (below) would still be useful for languages that require additional characters in their translation, eg German's ü->ue, ß->ss etc.

[original post]

Python allows you to use a translation dict to map characters to a replacement string.

A simple version for you case would be:

vietnamese_map = {
    ord(u'ư'): 'u',
    ord(u'ơ'): 'o',
    ord(u'á'): 'a',
    ord(u'n'): 'n',
    ord(u'h'): 'h',
    ord(u'ữ'): 'u',
    ord(u'n'): 'n',
    ord(u'g'): 'g',
    ord(u'v'): 'v',
    ord(u'i'): 'i',
    ord(u'ê'): 'e',
    ord(u'n'): 'n',
    ord(u'k'): 'k',
    ord(u'ẹ'): 'e',
    ord(u'o'): 'o',
}

And then you can call:

print u"những-viên-kẹo".translate(vietnamese_map)

To get:

u"nhung-vien-keo"

For more advanced use (ie a dynamic dict), see eg http://effbot.org/zone/unicode-convert.htm

Note that the above is just to show you what the map needs to look like, it's not a particularly convenient way of entering the data. A more convenient way to do the exact same thing is something like:

_map = u"nn hh ữu nn gg vv ii êe nn kk ẹe oo"
# Take the above string and generate a translation dict
vietnamese_map = dict((ord(m[0]), m[1:]) for m in _map.split())
print u"những-viên-kẹo".translate(vietnamese_map)

Will Hardy 2010-01-15 15:22:26

Answer 4

+1 A:

You can try normalize it Python ->

http://pyright.blogspot.com/2009/11/unicode-normalization-python-3x-unicode.html

this could help instead of retype the vietnamese alphabet from a á ớ bờ cờ dờ đờ and ignore the possibility of others special latin character, just run a normalization function, and test if everything work well, remember to test the word "đ" since I've encountered the problem that the normalization function did not normalize Đ - D.

Good luck :P

DucDigital 2010-02-06 15:05:21

haha, in this time, we can use normalization lib in Django ver. 1.2.x. But I still see problem with character Đ :D

Tran Tuan Anh 2010-08-04 17:09:45

the only problem is letter Đ, just replace it with D. Anh cứ replace hết cả string thay từ Đ sang D là ok :P

DucDigital 2010-08-07 08:19:51

ansaurus

tags:

views:

answers:

Django: Slug in Vietnamese

related questions