views:

174

answers:

9

I have strings containing numbers with their units, e.g. 2GB, 17ft, etc. I would like to separate the number from the unit and create 2 different strings. Sometimes, there is a whitespace between them (e.g. 2 GB) and it's easy to do it using split(' ').

When they are together (e.g. 2GB), I would test every character until I find a letter, instead of a number.

s='17GB'
number=''
unit=''
for c in s:
    if c.isdigit():
        number+=c
    else:
        unit+=c

Is there a better way to do it?

Thanks

+2  A: 

You could use a regular expression to divide the string into groups:

>>> import re
>>> p = re.compile('(\d+)\s*(\w+)')
>>> p.match('2GB').groups()
('2', 'GB')
>>> p.match('17 ft').groups()
('17', 'ft')
Jarret Hardie
To match the more general set of numbers, including "6.2" and "3.4e-27" would require a much more complex regex. Too bad python does not have a builtin scanf analog.
Christopher Bruns
A: 

How about using a regular expression

http://python.org/doc/1.6/lib/module-regsub.html

Ole Media
A: 

You should use regular expressions, grouping together what you want to find out:

import re
s = "17GB"
match = re.match(r"^([1-9][0-9]*)\s*(GB|MB|KB|B)$", s)
if match:
  print "Number: %d, unit: %s" % (int(match.group(1)), match.group(2))

Change the regex according to what you want to parse. If you're unfamiliar with regular expressions, here's a great tutorial site.

AndiDog
+3  A: 

tokenize can help:

>>> import StringIO
>>> s = StringIO.StringIO('27GB')
>>> for token in tokenize.generate_tokens(s.readline):
...   print token
... 
(2, '27', (1, 0), (1, 2), '27GB')
(1, 'GB', (1, 2), (1, 4), '27GB')
(0, '', (2, 0), (2, 0), '')
Ignacio Vazquez-Abrams
A: 

For this task, I would definitely use a regular expression:

import re
there = re.compile(r'\s*(\d+)\s*(\S+)')
thematch = there.match(s)
if thematch:
  number, unit = thematch.groups()
else:
  raise ValueError('String %r not in the expected format' % s)

In the RE pattern, \s means "whitespace", \d means "digit", \S means non-whitespace; * means "0 or more of the preceding", + means "1 or more of the preceding, and the parentheses enclose "capturing groups" which are then returned by the groups() call on the match-object. (thematch is None if the given string doesn't correspond to the pattern: optional whitespace, then one or more digits, then optional whitespace, then one or more non-whitespace characters).

Alex Martelli
A: 

A regular expression.

import re

m = re.match(r'\s*(?P<n>[-+]?[.0-9])\s*(?P<u>.*)', s)
if m is None:
  raise ValueError("not a number with units")
number = m.group("n")
unit = m.group("u")

This will give you a number (integer or fixed point; too hard to disambiguate scientific notation's "e" from a unit prefix) with an optional sign, followed by the units, with optional whitespace.

You can use re.compile() if you're going to be doing a lot of matches.

Mike DeSimone
+2  A: 
s='17GB'
for i,c in enumerate(s):
    if not c.isdigit():
        break
number=int(s[:i])
unit=s[i:]
gnibbler
-1 s='17GB' gives unit=' GB', i.e. a space in front of the unit. unit needs an lstrip and then you'll have the same answer as mine.
pwdyson
now I re-read the question, the case with the space is handled with a split(), not with this code. I tried to take the -1 back, but it wouldn't let me.
pwdyson
+2  A: 

You can break out of the loop when you find the first non-digit character

for i,c in enumerate(s):
    if not c.isdigit():
        break
number = s[:i]
unit = s[i:].lstrip()

If you have negative and decimals:

numeric = '0123456789-.'
for i,c in enumerate(s):
    if c not in numeric:
        break
number = s[:i]
unit = s[i:].lstrip()
pwdyson
>>> '.'.isdigit() -> False
Paul McGuire
yes, it works for positive integers. '-'.isdigit() is also False.
pwdyson
added code for negatives and decimals
pwdyson
+1  A: 
>>> s="17GB"
>>> ind=map(str.isalpha,s).index(True)
>>> num,suffix=s[:ind],s[ind:]
>>> print num+":"+suffix
17:GB
ghostdog74