views:

34

answers:

2

I'm trying to wrap words and words sequence from a given list with preg_replace. It almost works, but there are some use cases it doesn't and I can't figure it how.

For instance I do this:

    // sort by descending length
    usort($this->_keywords, function($a,$b){return(strlen($a)<strlen($b));});

    // wrapper is -%string%-
    foreach ($this->_keywords as $keyword) {
        $value = preg_replace('/((?!-)' . $keyword . '(?!-))/i', str_replace('%string%', '\1', $this->_wrapper), $value);
    }

From this keyword list:

  • lorem
  • ipsum
  • sit amet
  • null
  • sed
  • sed enim

I'd like to result in:

-Lorem- -ipsum- dolor -sit amet-, consectetur adipiscing elit. Phasellus rhoncus venenatis orci sed porta. Sed non dolor eros. Suspendisse a massa -sit amet- nulla egestas facilisis. Cras fringilla, leo ac ullamcorper semper, urna eros pretium lectus, nec rhoncus ligula risus eu velit. Nulla eu dapibus magna. Sed vehicula tristique lacinia. Maecenas tincidunt metus at urna consequat nec congue libero iaculis. Nulla facilisi. Phasellus -sed- sem ut risus mattis accumsan eu -sed enim-. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Suspendisse id est velit, eu cursus quam. Vivamus lacinia euismod pretium.

Any ideas?

+1  A: 

Easiest is to use preg_replace_callback(), and match words that already has been wrapped, and each keyword. When the match is a word that has been wrapped, just return it unmodified. No need for problematic look-arounds.

function compare_length($a, $b) {
    return strlen($a) < strlen($b);
}

function build_regex($keywords) {
    usort($keywords, 'compare_length');
    $pieces []= '/(?<wrapped>-[\w\s]*-)|(?<keyword>';
    for ($i = 0; $i < count($pieces); $i++) {
        if ($i > 0) $pieces []= '|';
        $pieces []= preg_quote($keywords[$i], '/');
    }
    $pieces []= ')/';
    return implode("", $pieces);
}

function wrap_callback($match) {
    if (!empty($match['wrapped'])) {
        return $match['wrapped'];
    }
    return "-{$match['wrapped']}-";
}

function wrap($text, $keywords) {
    $regex = build_regex($keywords);
    return preg_replace_callback($regex, 'wrap_callback');
}
MizardX
A: 

I finally resolved my problems by using the \b metacharacters which correspond to a word boundary.

public function filter($value)
{
    usort($this->_keywords, function($a,$b){return(strlen($a)<strlen($b));});

    foreach ($this->_keywords as $keyword) {
        $value = preg_replace(
            '/((?<!-)('.$keyword.'\b)(?!\-))/i',
            str_replace('%string%', '\2', $this->_wrapper) . '\3',
            $value
        );
    }

    return $value;
}
John