views:

29

answers:

2

I've noticed that I can not use all unicode characters in my python source code.

While

def 价(何):

is perfectly allright (albeit nonsensical [probably?]),

def N(N₀, t, λ) -> 'N(t)':

this isn't allowed (the subscript zero that is).

I also can't use some other characters, most of which I recognise as something other than letters (mathematical operators for example). I always thought that if I just stick to the rules I know, i.e. composing names from letters and numbers, with a letter as the first character, all will be okay. Now, the subscript zero is clearly a 'number'. so my impression was wrong.

I know I should avoid using special characters. However, the function definition above (the exponential decay one that is) seems to me perfectly reasonable - because it will never change, and it so elegantly conveys all the information needed for another programmer to use it.

My question therefore, exactly which characters are allowed and which aren't? And where?

Edit
All right I seem not to have been clear enough. I am using python3, so there is no need for declaring the encoding of the source file. Apparent I thought from then fact that my Chinese function definition works.

My question concerns why some characters are allowed there, while others aren't. The subscript zero raises an error, invalid character in identifier, but the blackboard bold zero works. Both equally special I'd say.

I'd like to know if there are any general rules that apply not just to my situation, there must be. It seems that my error is not an accident.

Edit 2:

The answer courtesy of Beau Martínez, pointing me to the language reference, where i should have looked in the first place:

http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html It appears the characters that are allowed are all chosen.

A: 

Tell Python what the proper encoding is:

http://www.python.org/dev/peps/pep-0263/

Either...

# -*- coding: utf-8 -*-

or

# coding=utf-8

As far as what characters are actually allowed in variable names, typically the restriction is alphabetical characters, digits, and underscores.

The "subscript zero" is not actually a digit. It's, well, a subscript.

Amber
sorry, I should have mentioned that I'm using Python 3.
stefano palazzo
+2  A: 

As per the language reference, Python 3 allows a large variety of characters as identifiers.

That zero subscript character seems like a number, but it isn't for Python; Python only treats 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 as numbers. It is in fact a character, so you can use it as an identifier (it's as if it were, instead, for example, a greek character such as Phi).

Importantly, how easily can you type those characters with your keyboard? I don't want to pull up the character map every time I have to call your functions, for example. Calling it "maximum_decay_rate" or something much more intuitive to any user, not just a Physics major, makes your code more readable.

If you say it isn't allowed, it's probably because you haven't specified the character encoding for your source file. It can be specified by having # -*- coding: utf-8 -*- (or which ever the encoding) at the beginning of your source file.

Beau Martínez
As you can see, I was only using 'special' characters where they would never have to be typed by anybody, unless the universe changes it's mind about exponential decay. :-) But yes, it should probably be called decay(). I've updated the question now.
stefano palazzo