views:

127

answers:

3

I'm running the following code on a list of strings to return a list of its words:

words = [re.split('\\s+', line) for line in lines]

However, I end up getting something like:

[['import', 're', ''], ['', ''], ['def', 'word_count(filename):', ''], ...]

As opposed to the desired:

['import', 're', '', '', '', 'def', 'word_count(filename):', '', ...]

How can I unpack the lists re.split('\\s+', line) produces in the above list comprehension? Naïvely, I tried using * but that doesn't work.

(I'm looking for a simple and Pythonic way of doing; I was tempted to write a function but I'm sure the language accommodates for this issue.)

A: 

You can always do this:

words = []
for line in lines:
  words.extend(re.split('\\s+',line))

It's not nearly as elegant as a one-liner list comprehension, but it gets the job done.

perimosocordiae
+1  A: 

The reason why you get a list of lists is because re.split() returns a list which then in 'appended' to the list comprehension output.

It's unclear why you are using that (or probably just a bad example) but if you can get the full content (all lines) as a string you can just do

words = re.split(r'\s+', lines)

if lines is the product of:

open('filename').readlines()

use

open('filename').read()

instead.

Unode
Using Python 3 man! No more readlines(), and everything's Unicode.
Beau Martínez
@Beau, readlines() works just fine in Python 3.
Iceman
Also, `re.split` doesn't take a list argument (I already tried that).
Beau Martínez
@Kevin True; however, I'm using `list(file)`.
Beau Martínez
@Beau, looking at your example I couldn't think of anything else other than something coming from a file or file-like type. Hence reading it as a string (as stated above) would be feasible.
Unode
@Beau lines = file.read() would give you the string
Unode
+2  A: 
>>> import re
>>> from itertools import chain
>>> lines = ["hello world", "second line", "third line"]
>>> words = chain(*[re.split(r'\s+', line) for line in lines])

This will give you an iterator that can be used for looping through all words:

>>> for word in words:
...    print(word)
... 
hello
world
second
line
third
line

Creating a list instead of an iterator is just a matter of wrapping the iterator in a list call:

>>> words = list(chain(*[re.split(r'\s+', line) for line in lines]))
Pär Wieslander
Pretty awesome way of doing it, though I'm disappointed Python doesn't allow a less "cluttered" way of doing it. Cheers.
Beau Martínez
Alternatively, you can use `chain.from_iterable` without unpacking the list.
Beau Martínez