ansaurus

Question

Answer 1

+2 A:

Consider using the built-in wordwrap() instead?

Amber 2009-09-03 10:00:48

the problem with wordwrap is that it can break the line in the middle of a utf8 wide char (rendering the string invalid utf8) or in the middle of an html element like , messing it up.

Omry 2009-09-03 10:06:17

@omry, see my answer

Dominic Rodger 2009-09-03 10:14:24

Answer 2

A:

I use this function to split strings in FireStats.

you can probably take it out of context and use it pretty easily. note that it's calling some other functions. you can skip the utf8 check if you like.

Omry 2009-09-03 10:02:31

Answer 3

+1 A:

Get rid of that complexity, use a DOM parser to extract the plain-text

//Dump contents (without tags) from HTML
$pageText = file_get_html('http://www.google.com/')-&gt;plaintext;
echo "Length is: " . strlen($pageText);

karim79 2009-09-03 10:06:16

Answer 4

+3 A:

If you're worried about UTF-8 support for wordwrap, then you want this:

function utf8_wordwrap($str, $width = 75, $break = "\n") // wordwrap() with utf-8 support {
    $str = preg_split('#[\s\n\r]+#', $str);
    $len = 0;
    foreach ($str as $val) {
        $val .= ' ';
        $tmp = mb_strlen($val, 'utf-8');
        $len += $tmp;
        if ($len >= $width) {
            $return .= $break . $val;
            $len = $tmp;
        }
        else {
            $return .= $val;
        }
    }
    return $return;
}

Source: PHP Manual Comment

As to your issue with codepoints - you might want to look at html_entity_decode, which I think converts codepoints (e.g. &#223) to the character they represent. You'll need to give it a charset so it knows what 223 means (since what '223' means depends on the charset).

Dominic Rodger 2009-09-03 10:13:04

Thanks for the tip on "html_entity_decode". I used that function and included it with what I was working on and it seems to be working perfect. Thanks again!

Patrik Johansson 2009-09-03 10:51:18

@Patrik Johansson - glad it worked for you :)

Dominic Rodger 2009-09-03 10:52:09

ansaurus

tags:

views:

answers:

How to split a long string with PHP?

related questions