ansaurus

Question

Answer 1

+1 A:

http://php.net/mb_string is the thing you're looking for

just mb_substr characters one by one
not until PHP6
what input exactly? The usual way in general

Col. Shrapnel 2010-04-07 08:40:22

Note that the comments section for `mb_split` there includes many examples of how to break a multibyte string up into an array of characters - for example, http://us2.php.net/manual/en/function.mb-split.php#80046

Amber 2010-04-07 08:45:32

@Dav I don't think he's really need an array.

Col. Shrapnel 2010-04-07 08:47:01

@Col. Shrapnel by input I mean the HTML code to parse. Maybe there is absolutely different way how to use the string with state machine which I am missing :-) ... but the mb_substr looks fine (if I know the string encoding, which is not so obvious)

Petr Peller 2010-04-07 09:08:21

@Dav Thanks, I was thinking about converting the string into an array of characters, but I think it isn't one of the cleanest solution. I would feel dirty :-)

Petr Peller 2010-04-07 09:10:42

Answer 2

A:

mb_internal_encoding("UTF-8");

$mb_string = 'žščř';

$l=mb_strlen($mb_string);

for($i=0;$i<$l;$i++){
    print(mb_substr($mb_string,$i,1)."<br/>");
}

zaf 2010-04-07 08:44:15

Answer 3

A:

Without using the mdb_relatedFunctions and with multi-byte encoded strings you can use standard sub string functions that read in multiples of the bytes used for encoding.

For example for a UTF-8 encoded (2 bytes) string if you need the first character from the string

$string = 'žščř'; //4 multi-byte characters in UTF-8

You have to get the $string[0] AND $string[1] values, so you are actually looking for the substring between indexes 0 and 1 (for the first character).

Note that $string[0] or $string[N] will reference the first (or Nth byte of the multi-byte string)

regards,

andreas 2010-04-07 10:47:17

Wouldn't be quite hard to know how many bytes I have to read? This is trivial example, but in general I don't know what characters are on the input (UTF-8 characters can be 1-4 bytes long).

Petr Peller 2010-04-07 11:03:28

Yes you have to determine how many bytes are used but it's an answer that might give you some information on using the NON mb_related functions - and manipulating multi-byte strings. Hope you find it useful.

andreas 2010-04-07 11:20:24

ansaurus

tags:

views:

answers:

Parsing multibyte string in PHP

related questions