ansaurus

Question

Is there a way to split a string by every nth seperator in Python?

Answer 1

A:

l = 'this-is-a-string'.split()
nl = []
ss = ""
c = 0
for s in l:
   c += 1
   if c%2 == 0:
       ss = s
   else:
       ss = "%s-%s"%(ss,s)
       nl.insert(ss)

print nl

SpliFF 2009-10-25 20:05:52

What's n? I get a name error as it's not defined.

Gnuffo1 2009-10-25 20:10:34

sorry, i misread your question first time and rewrote it, n was a leftover from previous. Now it gives a list of strings.

SpliFF 2009-10-25 20:15:08

This is very complicated (long to read/decipher), compared to many of the other solutions proposed here…

EOL 2009-10-25 21:44:06

rubbish. it's actually much easier to decipher. length is largely irrelevant and it could be shortened by making it less readable. It should have good performance since the loop only has a simple test condition to deal with. Also it has the most flexibility for handling other processing inside the loop. Also the winning answer will crash on a string with an odd number of hyphens. Iter and list ops might be pythonic but that doesn't necessarily make them 'better'.

SpliFF 2009-10-26 11:36:38

Answer 2

A:

EDIT: The original code I posted didn't work. This version does:

I don't think you can split on every other one, but you could split on every - and join every pair.

chunks = []
content = "this-is-a-string"
split_string = content.split('-')

for i in range(0, len(split_string) - 1,2) :
    if i < len(split_string) - 1:
        chunks.append("-".join([split_string[i], split_string[i+1]]))
    else:
        chunks.append(split_string[i])

EmFi 2009-10-25 20:13:20

This doesn't work, I get `["-", "-", "-", "-"]`

Jed Smith 2009-10-25 20:19:56

This does not work. The output consists of a list of 1 character strings containing a hyphen.

recursive 2009-10-25 20:20:37

@Jed His idea is good, you could write the implementation your own.

ReDAeR 2009-10-25 20:22:45

Yeah. Splice didn't work the way I thought it did, I've fixed the implementatino.

EmFi 2009-10-25 20:26:46

Downvote removed. You might as well just do split_string[i:i+2] rather than creating a list literal, since you know the size already.

recursive 2009-10-25 22:30:24

Answer 3

+14 A:

Here’s another solution:

span = 2
words = "this-is-a-string".split("-")
print ["-".join(words[i:i+span]) for i in range(0, len(words), span)]

Gumbo 2009-10-25 20:13:39

Thanks, Nick D.

Gumbo 2009-10-25 20:25:59

Why the down vote? What’s wrong with this answer?

Gumbo 2009-10-25 21:10:41

This seems the simplest for working for a variable length between seperations.

Gnuffo1 2009-10-25 23:42:27

Answer 4

+8 A:

Regular expressions handle this easily:

import re
s = "aaaa-aa-bbbb-bb-c-ccccc-d-ddddd"
print re.findall("[^-]+-[^-]+", s)

Output:

['aaaa-aa', 'bbbb-bb', 'c-ccccc', 'd-ddddd']

Update for Nick D:

n = 3
print re.findall("-".join(["[^-]+"] * n), s)

Output:

['aaaa-aa-bbbb', 'bb-c-ccccc']

recursive 2009-10-25 20:14:56

Probably the most elegant solution which is still readable, the rest are stretching it.

Jed Smith 2009-10-25 20:23:00

good answer but it's only for every 2nd separator.

Nick D 2009-10-25 20:33:24

… and only for an even number of words.

Gumbo 2009-10-25 20:37:02

Nick: Not so. See my update.Gumbo: Also not so. Just a simple change to the regex will handle that case as well if it is desired.

recursive 2009-10-25 20:40:02

@recursive, ok but I don't see the `d-ddddd` in the output ;-)

Nick D 2009-10-25 20:44:28

sorry, have to -1, too complicated, uses regex,

hasen j 2009-10-25 20:56:01

Answer 5

+15 A:

>>> s="a-b-c-d-e-f-g-h-i-j-k-l"         # use zip(*[i]*n)
>>> i=iter(s.split('-'))                # for the nth case    
>>> map("-".join,zip(i,i))    
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l']

>>> i=iter(s.split('-'))
>>> map("-".join,zip(*[i]*3))
['a-b-c', 'd-e-f', 'g-h-i', 'j-k-l']
>>> i=iter(s.split('-'))
>>> map("-".join,zip(*[i]*4))
['a-b-c-d', 'e-f-g-h', 'i-j-k-l']

Sometimes itertools.izip is faster as you can see in the results

>>> from itertools import izip
>>> s="a-b-c-d-e-f-g-h-i-j-k-l"
>>> i=iter(s.split("-"))
>>> ["-".join(x) for x in izip(i,i)]
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l']

Here is a version that sort of works with an odd number of parts depending what output you desire in that case. You might prefer to trim the '-' off the end of the last element with .rstrip('-') for example.

>>> from itertools import izip_longest
>>> s="a-b-c-d-e-f-g-h-i-j-k-l-m"
>>> i=iter(s.split('-'))
>>> map("-".join,izip_longest(i,i,fillvalue=""))
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l', 'm-']

Here are some timings

$ python -m timeit -s 'import re;r=re.compile("[^-]+-[^-]+");s="a-b-c-d-e-f-g-h-i-j-k-l"' 'r.findall(s)'
100000 loops, best of 3: 4.31 usec per loop

$ python -m timeit -s 'from itertools import izip;s="a-b-c-d-e-f-g-h-i-j-k-l"' 'i=iter(s.split("-"));["-".join(x) for x in izip(i,i)]'
100000 loops, best of 3: 5.41 usec per loop

$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' 'i=iter(s.split("-"));["-".join(x) for x in zip(i,i)]'
100000 loops, best of 3: 7.3 usec per loop

$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' 't=s.split("-");["-".join(t[i:i+2]) for i in range(0, len(t), 2)]'
100000 loops, best of 3: 7.49 usec per loop

$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' '["-".join([x,y]) for x,y in zip(s.split("-")[::2], s.split("-")[1::2])]'
100000 loops, best of 3: 9.51 usec per loop

gnibbler 2009-10-25 20:27:23

+1 Nice, clean solution...

ChristopheD 2009-10-25 20:37:24

Wow, that's great!

unutbu 2009-10-25 20:39:04

pythonic elegance

elzapp 2009-10-25 20:41:17

You’re using the wrong code for my proposal. I’m operating on the words an not the string. `python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l".split("-")' '["-".join(s[i:i+2]) for i in range(0, len(s), 2)]'`

Gumbo 2009-10-25 21:01:19

Nicely done, but fails for an odd number of elements. It shouldn't be too hard to overcome though.

RedGlyph 2009-10-25 21:10:58

@Gumbo, sorry, I fixed it to match your comment, I've just moved the `split()` out of the setup clause and used `t` as a temporary variable

gnibbler 2009-10-25 21:39:21

Answer 6

A:

I think several of the already given solutions are good enough, but just for fun, I did this version:

def twosplit(s,sep):
  first=s.find(sep)
  if first>=0:
    second=s.find(sep,first+1)
      if second>=0:
        return [s[0:second]] + twosplit(s[second+1:],sep)
      else:
        return [s]
    else:
      return [s]
  print twosplit("this-is-a-string","-")

elzapp 2009-10-25 20:39:51

ansaurus

tags:

views:

answers:

Is there a way to split a string by every nth seperator in Python?

related questions