tags:

views:

165

answers:

4

Is there a way using regex to replace characters in a string based on position?

For instance, one of my rewrite rules for a project I'm working on is "replace o with ö if o is the next to last vowel and even numbered (counting left to right)"

So, an example is:

heabatoik would become heabatöik (o is the next to last vowel, as well as the 4th vowel)

habatoik would not change (o is the next to last vowel, but is the 3rd vowel)

Is this possible using preg_replace in PHP?

+1  A: 

You can use preg_match_all to split the string into vowel/non-vowel parts and process that.

e.g. something like

preg_match_all("/(([aeiou])|([^aeiou]+)*/",
    $in,
    $out, PREG_PATTERN_ORDER);

Depending on your specific needs, you may need to modify the placement of ()*+? in the regex.

David Schmitt
+6  A: 

Starting with the beginning of the subject string, you want to match 2n + 1 vowels followed by an o, but only if the o is followed by exactly one more vowel:

$str = preg_replace(
  '/^((?:(?:[^aeiou]*[aeiou]){2})*)' .  # 2n vowels, n >= 0
    '([^aeiou]*[aeiou][^aeiou]*)' .     # odd-numbered vowel
    'o' .                               # even-numbered vowel is o
    '(?=[^aeiou]*[aeiou][^aeiou]*$)/',  # exactly one more vowel
  '$1$2ö',
  'heaeafesebatoik');

To do the same but for an odd-numbered o, match 2n leading vowels rather than 2n + 1:

$str = preg_replace(
  '/^((?:(?:[^aeiou]*[aeiou]){2})*)' .  # 2n vowels, n >= 0
    '([^aeiou]*)' .                     # followed by non-vowels
    'o' .                               # odd-numbered vowel is o
    '(?=[^aeiou]*[aeiou][^aeiou]*$)/',  # exactly one more vowel
  '$1$2ö',
  'habatoik');

If one doesn't match, then it performs no replacement, so it's safe to run them in sequence if that's what you're trying to do.

Greg Bacon
Why a `+` at that last `[^aeiou]+` and not a `*`?
Bart Kiers
@Bart Good catch!
Greg Bacon
I assumed *I* over-looked something! :)
Bart Kiers
+1 for style but seriously I would never want to maintain that
kemp
Thanks, could you tell me why the first part of the string gets cut off if the input string is longer? `heaeafesebatoik` gives me `fesebatöik`
Chris
@Chris Bugs fixed!
Greg Bacon
Thank you, what could I do if I wanted to do it for odd numbered, without conflicting with the even numbered regex?
Chris
@Chris See updated answer.
Greg Bacon
+1  A: 

I like to expand on Schmitt. (I don't have enough points to add a comment, I'm not trying to steal his thunder). I would use the flag PREG_OFFSET_CAPTURE as it returns not only the vowels but also there locations. This is my solution:

const LETTER = 1;
const LOCATION = 2
$string = 'heabatoik'

preg_match_all('/[aeiou]/', $string, $in, $out, PREG_OFFSET_CAPTURE);

$lastElement = count($out) - 1; // -1 for last element index based 0

//if second last letter location is even
//and second last letter is beside last letter
if ($out[$lastElement - 1][LOCATION] % 2 == 0 &&
    $out[$lastElement - 1][LOCATION] + 1 == $out[$lastElement][LOCATION])
       substr_replace($string, 'ö', $out[$lastElement - 1][LOCATION]);

note:

print_r(preg_match_all('/[aeiou]/', 'heabatoik', $in, $out, PREG_OFFSET_CAPTURE));
Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => e
                    [1] => 1
                )

            [1] => Array
                (
                    [0] => a
                    [1] => 2
                )

            [2] => Array
                (
                    [0] => a
                    [1] => 4
                )

            [3] => Array
                (
                    [0] => o
                    [1] => 6
                )

            [4] => Array
                (
                    [0] => i
                    [1] => 7
                )
        )
)
A: 

This is how I would do it:

$str = 'heabatoik';

$vowels = preg_replace('#[^aeiou]+#i', '', $str);
$length = strlen($vowels);
if ( $length % 2 && $vowels[$length - 2] == 'o' ) {
    $str = preg_replace('#o([^o]+)$#', 'ö$1', $str);
}
kemp