I'm working with a sqlite database, using python/django. I would need to have my results, which contain german umlauts (ä,ö,ü) to be correctly sorted (not with the umlauts at the end). Reading different articles on the web I'm not even sure if it is possible or not.
So any advise/instructions on that are appreciated. I already studied the docs for create_collation
etc. but I couldn't find any helpful examples for "beginners". Furthermore, if it is possible I'd like to know how to apply the necessary modifications on already existing tables!
views:
49answers:
2A similar question was asked on here 1 year ago.
The answer may be overkill for you, as stated by the OP of that question. I do, however, recommend James Tauber's Unicode Collation Algorithm.
An example is right on his webpage:
from pyuca import Collator
c = Collator("allkeys.txt")
sorted_words = sorted(words, key=c.sort_key)
So any advise/instructions on that are appreciated. I already studied the docs for create_collation etc. but I couldn't find any helpful examples for "beginners".
To create a collation with sqlite3
, you need a function that works like C's strcmp
.
def stricmp(str1, str2):
str1 = str1.lower()
str2 = str2.lower()
if str1 == str2:
return 0
elif str1 < str2:
return -1
else:
return 1
db = sqlite3.connect(':memory:')
# SQLite's default NOCASE collation is ASCII-only
# Override it with a (mostly) Unicode-aware version
db.create_collation('NOCASE', stricmp)
Note that although this collation will correctly handle 'ü' == 'Ü'
, it will still have 'ü' > 'v'
, because the letters still sort in Unicode code point order after case folding. Writing a German-friendly collation function is left as an exercise to the reader. Or better, to the author of an existing Unicode library.
Furthermore, if it is possible I'd like to know how to apply the necessary modifications on already existing tables!
You only need to modify the DB if you have an index that uses a collation you've overridden. Drop
that index and re-create
it.
Note that any column with a UNIQUE
(or PRIMARY KEY
) constraint will have an implicit index.