views:

69

answers:

7

How do I split up a string into several parts of a number of words in python. For example, turn a 10,000 word string into ten 1,000 word strings. Thanks.

+2  A: 

Under normal circumstances :

>>> a = "dedff fefef fefwff efef"
>>> a.split()
['dedff', 'fefef', 'fefwff', 'efef']
>>> k = a.split()
>>> [" ".join(k[0:2]), " ".join(k[2:4])]
['dedff fefef', 'fefwff efef']
>>> 
pyfunc
No, that splits by every word, not by a chosen number of words
usertest
@user201140 : I guess, I have still not understood the problem. It will be useful, if you can illustrate it.
pyfunc
The main point was using split. The following algo is cake. I'm voting for pyfunc.
OMG_peanuts
+2  A: 

this might work

def splitter(n, s):
    pieces = s.split()
    return (" ".join(pieces[i:i+n]) for i in xrange(0, len(pieces), n)

for piece in splitter(1000, really_long_string):
    print piece

This will yield ten 1000 word strings from a 10000 word string like you ask. Note that you can also use iterools grouper recipe but that would involve making 1000 copies of the iterator for your string: expensive I think.

Also note that this will replace all whitespace with spaces. If this isn't acceptable, you'll need something else.

aaronasterling
@Aaron: missing right parens?
Manoj Govindan
good eye as always.
aaronasterling
A: 

Pehaps something like this,

>>> s = "aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv"
>>> chunks = s.split()
>>> per_line = 5
>>> for i in range(0, len(chunks), per_line):
...     print " ".join(chunks[i:i + per_line])
... 
aa bb cc dd ee
ff gg hh ii jj
kk ll mm nn oo
pp qq rr ss tt
uu vv
plundra
A: 

this might help:

s="blah blah .................."
l =[]
for i in xrange(0,len(s),1000):
    l.append(s[i:i+1000])
jknair
+1  A: 

Try this:

s = 'a b c d e f g h i j k l'
n = 3

def group_words(s, n):
    words = s.split()
    for i in xrange(0, len(words), n):
        yield ' '.join(words[i:i+n])

list(group_words(s,n))
['a b c', 'd e f', 'g h i', 'j k l']
eumiro
Note that this is equivalent to Aaron's generator expression. I find this a bit more readable.
Glenn Maynard
A: 
start = 0
values = []
for x in range((len(whole_str)/1000)+1):
    values.append(whole_str[start:start+1000])
    start +=1000
FallenAngel
Please test your code.
Glenn Maynard
Just a typing error writing apppend instead of append... Gues that do not worth a downwote. You can tell my typing mistake more politely too.
FallenAngel
A: 

If you're comfortable using regular expressions, you could also try this:

import re

def split_by_number_of_words(input, number_of_words):
    regexp = re.compile(r'((?:\w+\W+){0,%d}\w+)' % (number_of_words - 1))
    return regexp.findall(input)

s = ' '.join(str(n) for n in range(1, 101)) # "1 2 3 ... 100"
for words in split_by_number_of_words(s, 10):
    print words
Steef