ansaurus

Question

Answer 1

+3 A:

result = rex.sub(' ', string) # this produces a string with tons of whitespace padding
result = rex.sub('', result) # this reduces all those spaces

Because you typo'd and forgot to use rex_s for the second call instead. Also, you need to substitute at least one space back in or you'll end up with any multiple-space gap becoming no gap at all, instead of a single-space gap.

result = rex.sub(' ', string) # this produces a string with tons of whitespace padding
result = rex_s.sub(' ', result) # this reduces all those spaces

Amber 2009-08-13 22:15:08

good catch on the type part. I just noticed that moments after posting. you beat me before I got the chance to edit OMG

nbv4 2009-08-13 22:22:43

I'd suggest taking a look at Alex's code - it's a lot more concise way of approaching the problem, and it also handles punctuation/whitespace at the end of the string nicely.

Amber 2009-08-13 22:29:28

Answer 2

+10 A:

Here's a single-step approach (but the uppercasing actually uses a string method -- much simpler!-):

rex = re.compile(r'\W+')
result = rex.sub(' ', strarg).upper()

where strarg is the string argument (don't use names that shadow builtins or standard library modules, please ... pretty please?-)

Alex Martelli 2009-08-13 22:15:59

I agree that this would definitely be the simpler way to approach the problem.

Amber 2009-08-13 22:19:09

I assume that should be "string" and not "result" in the argument of rex.sub? Or is this only replacing part of the user's code?

Brooks Moses 2009-08-14 00:37:27

@Brooks, you're right -- I'm SO averse to shadowing builtin and standard module names, that anything BUT string flew off my fingertips. Let me edit to fix, and thanks!

Alex Martelli 2009-08-14 01:44:21

Answer 3

+1 A:

Do you have to use regular expressions? Do you feel you must do it in one line?

>>> import string
>>> s = "stuff   .  // : /// more-stuff .. .. ...$%$% stuff -> DD"
>>> s2 = ''.join(c for c in s if c in string.letters + ' ')
>>> ' '.join(s2.split())
'stuff morestuff stuff DD'

John Fouhy 2009-08-13 22:38:39

Answer 4

+3 A:

s = "$$$aa1bb2 cc-dd ee_ff ggg."
re.sub(r'\W+', ' ', s).upper()
# ' AA1BB2 CC DD EE_FF GGG '

Is _ punctuation?

re.sub(r'[_\W]+', ' ', s).upper()
# ' AA1BB2 CC DD EE FF GGG '

Don't want the leading and trailing space?

re.sub(r'[_\W]+', ' ', s).strip().upper()
# 'AA1BB2 CC DD EE FF GGG'

John Machin 2009-08-14 00:23:59

+1 For bypassing the `re.compile(...)` step. If I could, I'd give you another +1 for also pointing out the question of how to handle '_' and leading/trailing WS, as these things are so often overlooked with regex.

too much php 2009-08-14 01:55:18

ansaurus

tags:

views:

answers:

collapsing whitespace in a string

related questions