tags:

views:

94

answers:

2

I want to take a description of a RSS feed located in $the_content and cut it off after 2 full sentences (or 200 words and then the next full sentence) using preg_split.

I tried a couple times, but I'm way off. I know what I want to do, but I can't seem to even start on something to make this work.

Thanks!

+1  A: 

Proper splitting of HTML is very tricky, and not worth doing with regular expressions. If you want HTML, something like DOM text iterator will be useful.

  1. Convert description to text:

    $text = html_entities_decode(strip_tags($html),ENT_QUOTES,'UTF-8');
    
  2. This will take first 200 characters (200 words is a bit too much for a sentence, isn't it?) and then look for end of sentence:

    $text = preg_replace('/^(.{200}.*?[.!?]).*$/','\1',$text);
    

You could change [.!?] to something more sophisticated, e.g. require space after punctuation or require that there's no punctuation nearby:

  (?<![^.!?]{5})[.!?](?=[^.!?]{5})

(?=…) is positive assertion. (?<!…) negative assertion that looks behind current position. {5} means 5 times.

I haven't tested it :)

porneL
A: 

Thanks! But it seems that there is an example post-processing script that uses preg_replace (for something entirely different), so I'd like to stick with something of the sort.

So if I could use preg_split, how can I take just text in $the_content, find the second sentence or the next sentence after maybe 100 or 200 words and output that? Either one works for me!