views:

67

answers:

2

Hello, I'm in the process of updating a program that fixes subtitles.

Till now I got away without using regular expressions, but the last problem that has come up might benefit by their use. (I've already solved it without regular expressions, but it's a very unoptimized method that slows my program significantly).

TL;DR;

I'm trying to make the following work:

I want all instances of:
"! ." , "!." and "! . " to become: "!"

unless the dot is followed by another dot, in which case I want all instances of:
"!.." , "! .." , "! . . " and "!. ." to become: "!..."

I've tried this code:

the_str = Regex.Replace(the_str, "\\! \\. [^.]", "\\! [^.]");

that comes close to the first part of what I want to do, but I can't make the [^.] character of the replacement string to be the same character as the one in the original string... Please help!

I'm interested in both C# and PHP implementations...

+1  A: 
$str = preg_replace('/!(?:\s*\.){2,3}/', '!...', $str);
$str = preg_replace('/!\s*\.(?!\s*\.)/', '!', $str);

This does the work in to PCREs. You probably could do some magic to merge it to one, but it wouldn't be readable anymore. The first PCRE is for !..., the second one for !. They are quite straightforward.

nikic
Isn't your second `preg_replace` gonna match the replacements you made in the first? - Nevermind. I see why it won't now.
BBonifield
already fixed that ;) but yeah, the first version would.
nikic
Thanks, that worked like a charm!
funkybomber
A: 

C#

s = Regex.Replace(s, @"!\s?\.\s?(\.?)\s?", "!$1$1$1");

PHP

$s = preg_replace('/!\s?\.\s?(\.?)\s?/', '!$1$1$1', $s);

The first dot is consumed but not captured; you're effectively throwing that one away. Group #1 captures the second dot if there is one, or an empty string if not. In either case, plugging it into the replacement string three times yields the desired result.

I used \s instead of literal spaces to make it more obvious what I was doing, and added the ? quantifier to make the spaces optional. If you really need to restrict it to actual space characters (not tabs, newlines, etc.) you can change them back to spaces. If you want to allow more than one space at a time, you can change ? to * where appropriate--e.g.:

@"!\s*\.\s*(\.?)\s*"

Also, notice the use of C#'s verbatim string literals--the antidote for backslashitis. ;)

Alan Moore