views:

45

answers:

2

PHP's wordwrap() function doesn't work correctly for multi-byte strings like UTF-8.

There are a few examples of mb safe functions in the comments, but with some different test data they all seem to have some problems.

The function should take the exact same parameters as wordwrap().

Specifically be sure it works to:

  • cut mid-word if $cut = true, don't cut mid-word otherwise
  • not insert extra spaces in words if $break = ' '
  • also work for $break = "\n"
  • work for ASCII, and all valid UTF-8
A: 

Here's my own attempt at a function that passed a few of my own tests, though I can't promise it's 100% perfect, so please post a better one if you see a problem.

/**
 * Multi-byte safe version of wordwrap()
 * Seems to me like wordwrap() is only broken on UTF-8 strings when $cut = true
 * @return string
 */
function wrap($str, $len = 75, $break = " ", $cut = true) { 
    $len = (int) $len;

    if (empty($str))
        return ""; 

    $pattern = "";

    if ($cut)
        $pattern = '/([^'.preg_quote($break).']{'.$len.'})/u'; 
    else
        return wordwrap($str, $len, $break);

    return preg_replace($pattern, "\${1}".$break, $str); 
}
philfreo
`wordwrap()` wraps only at a space character when `$cut` is `false`. This is why it works for UTF-8 which is designed to be backwards-compatible - characters not defined in ASCII are all encoded with the highest bit set, preventing collision with ASCII chars including the space.
Archimedix
Can you clarify? `wordwrap()` doesn't work for UTF-8, for example. I'm not sure what you mean by "wraps only at a space..."
philfreo
A: 

This one seems to work well...

function mb_wordwrap($str, $width = 75, $break = "\n", $cut = false, $charset = null) {
    if ($charset === null) $charset = mb_internal_encoding();

    $pieces = explode($break, $str);
    $result = array();
    foreach ($pieces as $piece) {
      $current = $piece;
      while ($cut && mb_strlen($current) > $width) {
        $result[] = mb_substr($current, 0, $width, $charset);
        $current = mb_substr($current, $width, 2048, $charset);
      }
      $result[] = $current;
    }
    return implode($break, $result);
}
philfreo