How do I split up a string into several parts of a number of words in python. For example, turn a 10,000 word string into ten 1,000 word strings. Thanks.
+2
A:
Under normal circumstances :
>>> a = "dedff fefef fefwff efef"
>>> a.split()
['dedff', 'fefef', 'fefwff', 'efef']
>>> k = a.split()
>>> [" ".join(k[0:2]), " ".join(k[2:4])]
['dedff fefef', 'fefwff efef']
>>>
pyfunc
2010-10-05 07:41:47
No, that splits by every word, not by a chosen number of words
usertest
2010-10-05 07:42:54
@user201140 : I guess, I have still not understood the problem. It will be useful, if you can illustrate it.
pyfunc
2010-10-05 07:45:36
The main point was using split. The following algo is cake. I'm voting for pyfunc.
OMG_peanuts
2010-10-05 07:47:14
+2
A:
this might work
def splitter(n, s):
pieces = s.split()
return (" ".join(pieces[i:i+n]) for i in xrange(0, len(pieces), n)
for piece in splitter(1000, really_long_string):
print piece
This will yield ten 1000 word strings from a 10000 word string like you ask. Note that you can also use iterools grouper recipe but that would involve making 1000 copies of the iterator for your string: expensive I think.
Also note that this will replace all whitespace with spaces. If this isn't acceptable, you'll need something else.
aaronasterling
2010-10-05 07:45:29
A:
Pehaps something like this,
>>> s = "aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv"
>>> chunks = s.split()
>>> per_line = 5
>>> for i in range(0, len(chunks), per_line):
... print " ".join(chunks[i:i + per_line])
...
aa bb cc dd ee
ff gg hh ii jj
kk ll mm nn oo
pp qq rr ss tt
uu vv
plundra
2010-10-05 07:48:16
A:
this might help:
s="blah blah .................."
l =[]
for i in xrange(0,len(s),1000):
l.append(s[i:i+1000])
jknair
2010-10-05 07:52:52
+1
A:
Try this:
s = 'a b c d e f g h i j k l'
n = 3
def group_words(s, n):
words = s.split()
for i in xrange(0, len(words), n):
yield ' '.join(words[i:i+n])
list(group_words(s,n))
['a b c', 'd e f', 'g h i', 'j k l']
eumiro
2010-10-05 07:58:05
Note that this is equivalent to Aaron's generator expression. I find this a bit more readable.
Glenn Maynard
2010-10-05 08:30:53
A:
start = 0
values = []
for x in range((len(whole_str)/1000)+1):
values.append(whole_str[start:start+1000])
start +=1000
FallenAngel
2010-10-05 07:58:23
Just a typing error writing apppend instead of append... Gues that do not worth a downwote. You can tell my typing mistake more politely too.
FallenAngel
2010-10-05 08:32:00
A:
If you're comfortable using regular expressions, you could also try this:
import re
def split_by_number_of_words(input, number_of_words):
regexp = re.compile(r'((?:\w+\W+){0,%d}\w+)' % (number_of_words - 1))
return regexp.findall(input)
s = ' '.join(str(n) for n in range(1, 101)) # "1 2 3 ... 100"
for words in split_by_number_of_words(s, 10):
print words
Steef
2010-10-05 08:19:33