tags:

views:

576

answers:

3

I've been trying to use

$string="The Dr. is here!!! I am glad I'm in the U.S.A. for the Dr. quality is great!!!!!!";
preg_match_all('~.*?[?.!]~s',$string,$sentences);
print_r($sentences);

but it doesn't work on Dr. , U.S.A., etc..

Does anyone have any better suggestions?

A: 

hmmm maybe try something like $sentences = preg_split('/.*?[?.!]+\s+/', $string);

prodigitalson
A: 

This is almost impossible since your example clearly indicates that punctuation characters that can be used in e.g. Dr., U.S.A etc, make it impossible to know where a sentence starts/ends.

You have to search the following characters to decide if a new sentence follows (starts after) the punctuation chars you are mentioning.

andreas
Nothing is imposible...
Scott Tyler
Almost impossible....with this approach..mind the "almost" :) regards
andreas
+8  A: 

there is not any simple solution for that. you need do some natural language processing(NLP) in your application and recognize each sentence. there is something call OpenNLP, it's a JAVA-based NLP parser tool. Or Stanford NLP parser in Ruby. you can find something like that for php.

here I found a set of classes for natural language processing in PHP.

Michel Kogan
+1 - and indeed, even a solution that uses NLP is likely to fail when faced with sufficiently informal (e.g. sloppy) writing, If people don't follow the basic rules of punctuation, you are stuffed.
Stephen C