views:

305

answers:

4

This is the first time I've came across this. Just printed a list and each element seems to have a u in front of it i.e.

[u'hello', u'hi', u'hey']

What does it mean and why would a list have this in front of each element?

As I don't know how common this is, if you'd like to see how I came across it, I'll happily edit the post.

+5  A: 

The u just means that the following string is a unicode string (as opposed to a plain ascii string). It has nothing to do with the list that happens to contain the (unicode) strings.

Rasmus Kaj
+3  A: 

I believe the u' prefix creates a unicode string instead of regular ascii

toasteroven
+15  A: 

it's an indication of unicode string. similar to r'' for raw string.

>>> type(u'abc')
<type 'unicode'>
>>> r'ab\c'
'ab\\c'
SilentGhost
Ah, I thought r'' meant something to do with a regular expression?
day_trader
It's generally used for regular expressions so we can write things like `r'/[ \t]+/'` instead of `'/[ \\t]+/'` (note the double backslash - you don't have to escape things in raw strings unless you're escaping the closing quote).
Samir Talwar
it's often used in regex to avoid all the escaping backslashes
SilentGhost
I see. If I iterate through a unicode listing and check if some string is 'in' the list, will that recognise the string? I'm currently checking each element to see if it matches a certain string and it keeps escaping everytime. Is this because it's Unicode?
day_trader
r and u are a bit different. u indicates the type of the string, whereas r (or ru, if you want to use raw unicode literals) makes a normal str (or unicode, if u and r are both used) but that is parsed differently at compile time. `>>> repr(r'foo') "'foo'" >>> repr(u'foo') "u'foo'"`Notice how the r goes away (that's just a matter of what backslashes do) and the u does not (because it makes an object of different type.)
Mike Graham
if your string is a unicode string that uses only ascii characters (as in your example) `in` operation would cast the strings implicitly and you'll get `True`: 'abc' in [u'abc'] results in `True`. If your unicode string uses characters outside of ascii charset, you naturally would get `False` in such test.
SilentGhost
+9  A: 

Unicode.

mculp