ansaurus

Question

How do I do what strtok() does in C, in Python?

Answer 1

+22 A:

How about this:

A = '1, 2,,3,4  '
B = [int(x) for x in A.split(',') if x.strip()]

x.strip() trims whitespace from the string, which will make it empty if the string is all whitespace. An empty string is "false" in a boolean context, so it's filtered by the if part of the list comprehension.

Dave Ray 2009-01-18 23:09:34

-1? What? How does this not do exactly what he asked for?

Dave Ray 2009-01-18 23:16:24

strip() is overkill here, converting to int already takes care of the whitespace... better filter out the invalid substrings.

Algorias 2009-01-19 02:00:19

+1 Without the test, it'll fail for e.g. a = "1, 2, , 3, 4"

Ryan Ginstrom 2009-01-19 02:26:41

Answer 2

A:

I'd guess regular expressions are the way to go: http://docs.python.org/library/re.html

Simon Groenewolt 2009-01-18 23:12:18

Seems overkill in this case...

Algorias 2009-01-19 01:41:56

Answer 3

A:

This will work, and never raise an exception, if all the numbers are ints. The isdigit() call is false if there's a decimal point in the string.

>>> nums = ['1,,2,3,\n,4\n', '1,2,3,4', ',1,2,3,4,\t\n', '\n\t,1,2,3,,4\n']
>>> for n in nums:
...     [ int(i.strip()) for i in n if i.strip() and i.strip().isdigit() ]
... 
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]

runeh 2009-01-18 23:41:52

The isdigit check is not necessary for the test cases provided, but it does add extra robustness.

Carl Meyer 2009-01-19 00:50:14

what if he considers '1,2,a' to be an error?

hasen j 2009-01-19 01:53:59

The first i.strip() is redundant.

Ryan Ginstrom 2009-01-19 02:48:52

You're performing i.strip() three times per element. Yikes.

Triptych 2009-01-19 03:33:46

Answer 4

+1 A:

How about this?

>>> a = "1,2,,3,4,"
>>> map(int,filter(None,a.split(",")))
[1, 2, 3, 4]

filter will remove all false values (i.e. empty strings), which are then mapped to int.

EDIT: Just tested this against the above posted versions, and it seems to be significantly faster, 15% or so compared to the strip() one and more than twice as fast as the isdigit() one

Algorias 2009-01-19 01:49:23

he needs it to filter whitespace

hasen j 2009-01-19 01:54:49

Yes, this will filter out whitespace: >>> int(" 1 ") ==> 1

Algorias 2009-01-19 01:56:50

This fails if any of the empty entries has whitespace, e.g. '\n\t,1,2,3,,4\n'

Dave Ray 2009-01-19 02:25:29

Doesn't work when list has elements containing whitespace.

Triptych 2009-01-19 03:33:02

Answer 5

+2 A:

Generally, I try to avoid regular expressions, but if you want to split on a bunch of different things, they work. Try this:

import re
result = [int(x) for x in filter(None, re.split('[,\n,\t]', A))]

Nick 2009-01-19 01:54:24

Answer 6

A:

Why not just wrap in a try except block which catches anything not an integer?

Josh Smeaton 2009-01-19 02:50:52

Answer 7

+3 A:

Mmm, functional goodness (with a bit of generator expression thrown in):

a = "1,2,,3,4,"
print map(int, filter(None, (i.strip() for i in a.split(','))))

For full functional joy:

import string
a = "1,2,,3,4,"
print map(int, filter(None, map(string.strip, a.split(','))))

Alec Thomas 2009-01-19 02:54:49

`print map(int, filter(len, map(str.strip, a.split(','))))` Note: str.strip(i) and i.strip() are the same (no need in `string` module). `len` is used for readability.

J.F. Sebastian 2009-01-19 19:35:42

I prefer `string.strip()` as it safely deals with Unicode strings.

Alec Thomas 2009-01-20 01:58:09

@Alec: `map(type(a).strip, a.split(','))` will work both for Unicode and encoded strings.

J.F. Sebastian 2009-01-20 04:16:32

Answer 8

A:

Why accept inferior substitutes that cannot segfault your interpreter? With ctypes you can just call the real thing! :-)

# strtok in Python
from ctypes import c_char_p, cdll

try: libc = cdll.LoadLibrary('libc.so.6')
except WindowsError:
     libc = cdll.LoadLibrary('msvcrt.dll')

libc.strtok.restype = c_char_p
dat = c_char_p("1,,2,3,4")
sep = c_char_p(",\n\t")
result = [libc.strtok(dat, sep)] + list(iter(lambda: libc.strtok(None, sep), None))
print(result)

joeforker 2009-01-19 16:43:40

added `cdll.LoadLibrary('msvcrt.dll')`

J.F. Sebastian 2009-01-19 20:05:52

When I saw the `while` loop I'd thought of `lambda tokenize dat, sep: itertools.chain((strtok(dat, sep),), iter(lambda: strtok(None, sep), None))`. It is funny how similar programmers' minds work. Add the smile emoticon back otherwise somebody can think that the code is not a joke.

J.F. Sebastian 2009-01-21 19:44:34

ansaurus

tags:

views:

answers:

How do I do what strtok() does in C, in Python?

related questions