ansaurus

Question

Answer 1

A:

Do you really mean u'String'?

In any event, can't you just do str(string) to get a string rather than a unicode-string? (This should be different for Python 3, for which all strings are unicode.)

Andrew Jaffe 2009-03-01 11:01:20

I should have been clearer. I am using str() but still getting output like below when I print. [u'ABC'] [u'DEF'] [u'GHI'] [u'JKL'] The data is stripped as text from a webpage, then inserted into a database (Google Appstore), then retrieved and printed.

gnuchu 2009-03-01 11:09:13

Answer 2

+1 A:

Use dir or type on the 'string' to find out what it is. I suspect that it's one of BeautifulSoup's tag objects, that prints like a string, but really isn't one. Otherwise, its inside a list and you need to convert each string separately.

In any case, why are you objecting to using Unicode? Any specific reason?

sykora 2009-03-01 11:14:19

I've been looking at BeautifulSoup since the last few days. I couldn't figure out how gnuchu would get u['string'] not [u'String']. His comment to Andrew Jaffe seems to prove it is a list.

batbrat 2009-03-01 11:54:02

+1 on teaching him to fish instead of catching a fish and giving it to him.

batbrat 2009-03-01 11:54:54

Answer 3

+6 A:

[u'ABC'] would be a one-element list of unicode strings. Beautiful Soup always produces Unicode. So you need to convert the list to a single unicode string, and then convert that to ASCII.

I don't know exaxtly how you got the one-element lists; the contents member would be a list of strings and tags, which is apparently not what you have. Assuming that you really always get a list with a single element, and that your test is really only ASCII you would use this:

 soup[0].encode("ascii")

However, please double-check that your data is really ASCII. This is pretty rare. Much more likely it's latin-1 or utf-8.

 soup[0].encode("latin-1")


 soup[0].encode("utf-8")

Or you ask Beautiful Soup what the original encoding was and get it back in this encoding:

 soup[0].encode(soup.originalEncoding)

oefe 2009-03-01 11:22:11

Brilliant. Thanks. Apologies for the typo.

gnuchu 2009-03-01 12:15:38

You actually don't have to do the encoding, because the OP is only seeing the string repr because thats how you see anything when you print a list. soup[0] will be enough to show the str instead of the repr, showing the contents of the string and not the quote and unicode modifier.

ironfroggy 2009-03-01 13:36:57

Answer 4

+2 A:

You probably have a list containing one unicode string. The repr of this is [u'String'].

You can convert this to a list of byte strings using any variation of the following:

# Functional style.
print map(lambda x: x.encode('ascii'), my_list)

# List comprehension.
print [x.encode('ascii') for x in my_list]

# Interesting if my_list may be a tuple or a string.
print type(my_list)(x.encode('ascii') for x in my_list)

# What do I care about the brackets anyway?
print ', '.join(repr(x.encode('ascii')) for x in my_list)

# That's actually not a good way of doing it.
print ' '.join(repr(x).lstrip('u')[1:-1] for x in my_list)

ddaa 2009-03-01 11:40:24

ansaurus

tags:

views:

answers:

Python string prints as u['String']

related questions