views:

76

answers:

4

In python, I need to generate a dict that maps a letter to a pre-defined "one-hot" representation of that letter. By way of illustration, the dict should look like this:

{ 'A': '1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0',
  'B': '0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0', # ...
}

There is one bit (represented as a character) per letter of the alphabet. Hence each string will contain 25 zeros and one 1. The position of the 1 is determined by the position of the corresponding letter in the alphabet.

I came up with some code that generates this:

# Character set is explicitly specified for fine grained control
_letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
n = len(_letters)
one_hot = [' '.join(['0']*a + ['1'] + ['0']*b)
            for a, b in zip(range(n), range(n-1, -1, -1))]
outputs = dict(zip(_letters, one_hot))

Can anyone think of a more efficient/cleaner/more pythonic way to do the same thing?

A: 

That seems pretty clear, concise, and Pythonic to me.

bcat
+1  A: 
one_hot = [' '.join(['0']*a + ['1'] + ['0']*b)
            for a, b in zip(range(n), range(n-1, -1, -1))]
outputs = dict(zip(_letters, one_hot))

In particular, there's a lot of code packed into these two lines. You might try the Introduce Explaining Variable refactoring. Or maybe an extract method.

Here's one example:

def single_onehot(a, b):
    return ' '.join(['0']*a + ['1'] + ['0']*b)

range_zip = zip(range(n), range(n-1, -1, -1))
one_hot = [ single_onehot(a, b) for a, b in range_zip]
outputs = dict(zip(_letters, one_hot))

Although you might disagree with my naming.

Jason Baker
+4  A: 

i find this to be more readable:

from string import ascii_uppercase

one_hot = {}
for i, l in enumerate(ascii_uppercase):
    bits = ['0']*26; bits[i] = '1'
    one_hot[l] = ' '.join(bits)

if you need a more general alphabet, just enumerate over a string of the characters, and replace ['0']*26 with ['0']*len(alphabet).

Autoplectic
Very readable. +1
Ionuț G. Stan
Nice. I think I tried an ugly attempt at functional programming. Your version is much more readable though.
Whisty
The use of `ascii_uppercase` is a nice touch (I didn't even know it existed.), but I'm not sure this is more readable than the original code. The list comprehension is terse, but it's also very expressive. Using an explicit loop makes the code's purpose less clear in my eyes.
bcat
+2  A: 

In Python 2.5 and up you can use the conditional operator:

from string import ascii_uppercase

one_hot = {}
for i, c in enumerate(ascii_uppercase):
    one_hot[c] = ' '.join('1' if j == i else '0' for j in range(26))
Laurence Gonsalves