views:

125

answers:

2

I'm trying to use sed to replace whitespace within a string. For example, given the line:

var test = 'Some test text here.';

I want to get:

var test = 'Sometesttexthere.';

I've tried using (\x27 matches the '):

sed 's|\x27\([^\x27[:space:]]*\)[[:space:]]|\x27\1|g

but that just gives

var test = 'Sometest text here.';

Any ideas?

A: 

Your command line has two problems:

  • First, there's a missing \ after [^.

  • Second, even though you use the g modifier, only the first space is removed. Why? Because that modifier leads to replacement of successive matches within the same line. It does not re-scan the whole line from the beginning. But this is required here, because your match is anchored at the initial ' of the string literal.

The obvious way to solve this problem is to use a loop, implemented by a conditional jump (jump with tLabel to a :Label; t jumps if at least one s matched since the last test with t).

This is easiest with a sed script (and you don't have to escape the '), like so:

:a
s|'\([^'[:space:]]*\)[[:space:]]|'\1|
ta

But it can be done one the command prompt. The exact syntax may depend on your sed flavour, for mine (super-sed on Windows) it is invoked like so:

sed -e ":a" -e "s|\x27\([^\x27[:space:]]*\)[[:space:]]|\x27\1|;ta"

You need two separate script expressions, because the label :a extends until the end of an expression.

Christian Semrau
Using GNU `sed`, a semicolon ends a label.
Dennis Williamson
@Dennis: Indeed, now that I tried, it also works with super-sed. Seems like I used another version of sed before, which did not support it, and never bothered to check again.
Christian Semrau
This solution does work exactly as needed. As you noted in your comment above, it isn't the most efficient. That is relevant for my input lengths, but it works well enough for my application. Thanks.
blazeprogrammer
A: 

This is a much more complex sed script, but it works without a loop. You know, just for the sake of variety:

sed 'h;s/[^\x27]*\x27\(.*\)/\n\x27\1/;s/ //g;x;s/\([^\x27]*\).*/\1/;G;s/\n//g'

It makes a copy of the string, splits one (which will become the second half) at the first single quote discarding the first half, replaces all the spaces in the second half, swaps the copies, splits the other one discarding the second half, merges them back together and removes the newlines used for the splitting and the one added by the G command.

Edit:

In order to select particular lines to operate on, you can use some selection criteria. Here I've specified that the line must contain an equal sign and at least two single quotes:

sed '/.*=.*\x27.*\x27.*/ {h;s/[^\x27]*\x27\(.*\)/\n\x27\1/;s/ //g;x;s/\([^\x27]*\).*/\1/;G;s/\n//g}'

You could use whatever regex works best to include and exclude appropriately for your needs.

Dennis Williamson
Good solution. It seems to be O(n) instead of my O(n^2) [Shlemiel the Painter solution](http://en.wikipedia.org/wiki/Schlemiel_the_Painter%27s_algorithm), but this fact is probably not relevant for the input lengths at hand.
Christian Semrau
Your sed (regex) skills very clearly surpass mine. Very good solution. One thing I guess I should have mentioned is that not every line has an '. Your solution works perfectly for the lines that do. Those that don't have a ' end up getting copied though. For example, a line with var x = 2; will end up as var x = 2;var x = 2;
blazeprogrammer
For anyone interested, this solution is much faster than the accepted solution, with the main difference being that this one doesn't work exactly as needed (my fault, not the responder's). On my test file, this solution takes .06s while the accepted solution takes 19.3s.
blazeprogrammer
@blazeprogrammer: See my edit.
Dennis Williamson