tags:

views:

36

answers:

1

Hi all:

I've convinced my boss to do the typesetting stuff using PHP(PHP Version 5.2.8). And this is what I got so far(set Character encoding to Unicode(UTF-8) if you see misrendered Japanese characters):

demo page at my personal website

Basically, if you copy and paste the latin sample paragraph into the textarea and click the button, everything works well, you can verify that by pasting the result into Notepad for a check(albeit the fact that I haven't done anything to use hyphens to denote words separated by new lines).

However, when it comes with non-latin/Asian characters, nothing got printed out. I didn't get any error message generated, just cannot see anything at all...

The following is my code:

<?php
$words = typesetWords($_POST['words']);
echo json_encode(array('feedback' => $words));

function typesetWords($words, $lineLength = 70)
{
 try
 {
  $result = '';
  $paragraphs = explode("\n\n", $words);
  foreach($paragraphs as $paragraph)
  {
   $paragraph = str_replace("\n", "", $paragraph);
   $length = strlen($paragraph);
   $numberOfLines = intval($length / $lineLength);
   $tmp = '';
   if($numberOfLines > 0)
   {
    for($i = 0; $i < $numberOfLines; $i++)
     $tmp .= substr($paragraph, $i * $lineLength, $lineLength)."\n";
    $tmp .= substr($paragraph, -1 * ($length % $lineLength))."\n\n";
    $result .= $tmp;
   }
   else $result .= $paragraph."\n\n";
  }
 }
 catch(Exception $e)
 {
  return $e->getMessage();
 }
 return $result;
}

?>

I tried to return what was sent by the form directly back, and I did see the Japanese sample paragraph without problems. So I reckon one of the PHP library functions must have caused the error, but I couldn't tell which one and how to fix it...

Many thanks in advance!

A: 

strlen() will return the number of characters from a string formatted for ANSI/ASCII, not UTF-8. Try mb_strlen() instead.

JP
Thanks for the hint! I will go and check.
Michael Mao
No. mb_strlen() wouldn't solve the problem. Still got nothing in textarea.
Michael Mao
Oh. I got it. After setting mb_interal_encoding to UTF-8 and replacing all string functions with UTF-8 safe peers, everything works!Thanks for the hint!
Michael Mao