views:

60

answers:

3

I would like to split a text
过公元年?因为无论你如何选择。简体字危及了对古代文学的研究输入!

Using on of these three (or more) ?!。 characters as delimiter. i can do this of course with
$lines = preg_split('/[。,!,?]/u',$body);

However i wan't to have the resulting lines keep their ending delimiter. Also a sentence might end like so 啊。。。 or 什么!??!!!!

A: 

In this case, you'd like to write the string splitter yourself. And keep continuous delimiters as a whole. (you can set a state variable indicating whether it is in text block or delimiter block).

Yin Zhu
A: 

You should use preg_match_all instead of preg_split, i.e.

preg_match_all("/[^?!。]+[?!。]+/u", $text, $res);

See http://www.ideone.com/rN7MB for usage.

KennyTM
+1  A: 

Try this:

$lines = preg_split('/(?<=[。!?])(?![。!?])/u',$body);

It splits at a position that's preceded by one of your delimiter characters but not followed by one. It doesn't consume the delimiter, and if there are two or more consecutive delimiters, it only matches after the last one.

Alan Moore
Works perfect!!
Moak