ansaurus

Question

How can I condense stand-alone characters in Perl?

Answer 1

+1 A:

$str =~ s/\b([A-Z])\s+(?=[A-Z]\b)/$1/g;

KennyTM 2010-05-17 18:09:37

I saw the "\1" was changed to "$1". Both versions appear to work...so what is the difference?

brydgesk 2010-05-17 18:16:38

The word boundary assertion (`\b`) might not be what you want here. If the string `"A B C!"` should become `"AB C!"` you will need to use something else. Also, if `"A B C1"` should become `"ABC1"` then you will need to use something else.

Chas. Owens 2010-05-17 18:18:27

@brydgesk read the output of `perl -Mdiagnostics -e '$" =~ s/(a)/\1/'` Basically it is a style and consistency issue (e.g. `\10` likely doesn't mean what you think it does, but `$10` does).

Chas. Owens 2010-05-17 18:23:32

I think \b should be fine at the beginning, but at the end I'd like to look for either whitespace or end of string/line.

brydgesk 2010-05-17 18:23:54

@bry: Oops. That was just because I was testing the regex in Python which doesn't accept the `$1`.

KennyTM 2010-05-17 18:35:19

Answer 2

+1 A:

The reason it's not working is that you have leading and trailing spaces in your regex. Once " A B C " becomes " AB C ", the B no longer has a leading space - the A is there.

The simplest solution would be to take those out and use s/([A-Z]) ([A-Z])/\1\2/g which should fulfill the stated requirements, but it would also turn all-caps phrases into a single block of letters (e.g., "THIS IS A TEST" -> "THISISATEST"), which may not be acceptable to you.

If you need to only collapse single capital letters and not groups of them (e.g., "FOR I M A TEST" -> "FOR IMA TEST", not "FORIMATEST"), then I don't think that's possible with a single regex. You'd have to do it in two passes, one to mark which spaces to collapse and the second to actually remove the marks (e.g., "FOR I M A TEST" -> "FOR I^M^A TEST" -> "FOR IMA TEST") because you otherwise can't distinguish between a pair of uppercase letters which were originally paired and one which was originally space-separated but has already been collapsed.

Dave Sherohman 2010-05-18 09:28:30

ansaurus

tags:

views:

answers:

How can I condense stand-alone characters in Perl?

related questions