views:

323

answers:

2

Is there a way that I can add alias to python for encoding. There are sites on the web that are using the encoding 'windows-1251' but have their charset set to win-1251, so I would like to have win-1251 be an alias to windows-1251

+2  A: 
>>> import encodings
>>> encodings.aliases.aliases['win_1251'] = 'cp1251'
>>> print '\xcc\xce\xd1K\xc2\xc0'.decode('win-1251')
MOCKBA

Although I personally would consider this monkey-patching, and use my own conversion table. But I can't give any good arguments for that position. :)

Lennart Regebro
Alex did provide a good argument for that position above. :-)I think the official way is too much work, and would still simply provide my own conversion list, but that is not always feasible
Lennart Regebro
+3  A: 

The encodings module is not well documented so I'd instead use codecs, which is:

import codecs

def encalias(oldname, newname):
  old = codecs.lookup(oldname)
  new = codecs.CodecInfo(old.encode, old.decode, 
                         streamreader=old.streamreader,
                         streamwriter=old.streamwriter,
                         incrementalencoder=old.incrementalencoder,
                         incrementaldecoder=old.incrementaldecoder,
                         name=newname)
  def searcher(aname):
    if aname == newname:
      return new
    else:
      return None
  codecs.register(searcher)

This is Python 2.6 -- the interface is different in earlier versions.

If you don't mind relying on a specific version's undocumented internals, @Lennart's aliasing approach is OK, too, of course - and indeed simpler than this;-). But I suspect (as he appears to) that this one is more maintainable.

Alex Martelli
Great point Alex! --- Do no use a module which does not have a great documentation.
Masi