tags:

views:

1910

answers:

6

I'm using the following regex to capture a fixed width "description" field that is always 50 characters long:

(?.{50})

My problem is that the descriptions sometimes contain a lot of whitespace, e.g.

"FLUID        COMPRESSOR                          "

Can somebody provide a regex that:

  1. Trims all whitespace off the end
  2. Collapses any whitespace in between words to a single space
+1  A: 

Is there a particular reason you are asking for a regular expression? They may not be the best tool for this task.

A replacement like

 s/[ \t]+/ /g

should compress the internal whitespace (actually, it will compress leading and trailing whitespace too, but it doesn't sound like that is a problem.), and

s/[ \t]+$/$/

will take care of the trailing whitespace. [I'm using sedish syntax here. You didn't say what flavor you prefer.]


Right off hand I don't see a way to do it in a single expression.

dmckee
I'm using this inside of a larger regular expression, from http://stackoverflow.com/questions/162727/read-fixed-width-record-from-text-file
Chris Karcher
+8  A: 

Substitute two or more spaces for one space:

s/  +/ /g

Edit: for any white space (not just spaces) you can use \s if you're using a perl-compatible regex library, and the curly brace syntax for number of occurrences, e.g.

s/\s\s+/ /g

or

s/\s{2,}/ /g

Edit #2: forgot the /g global suffix, thanks JL

sk
Or even just s/\s+/ /g -- it occasionally maps a single space to another single space, but it hardly matters. But the global suffix does matter, of course.
Jonathan Leffler
Unfortunately all proposed regexes leave one space at the end if it was there in the initial string.
Alexander Prokofyev
Good point, but is there a single regex that can do both?
sk
+1  A: 

Perl-variants: 1) s/\s+$//; 2) s/\s+/ /g;

+4  A: 
str = Regex.Replace(str, " +( |$)", "$1");
Alan Moore
Bravo! This regex correctly processes spaces between words and at the end of string.
Alexander Prokofyev
Same thing I was going to suggest. :)
MizardX
+1  A: 

Since compressing whitespace and trimming whitespace around the edges are conceptually different operations, I like doing it in two steps:

re.replace("s/\s+/ /g", str.strip())

Not the most efficient, but quite readable.

rodarmor
A: 

/(^[\s\t]+|[\s\t]+([\s\t]|$))/g replace with $2 (beginning|middle/end)

anonymous