views:

41

answers:

1

I'm trying to pull the first paragraph out of Markdown formatted documents:

This is the first paragraph.

This is the second paragraph.

The answer here gives me a solution that matches the first string ending in a double line break.

Perfect, except some of the texts begin with Markdown-style headers:

### This is an h3 header.

This is the first paragraph.

So I need to:

  • Skip any line that begins with one or more # symbols.
  • Match the first string ending in a double line break.

In other words, return 'This is the first paragraph' in both of the examples above.

So far, I've tried many variations on:

"/(?s)(?:(?!\#))((?!(\r?\n){2}).)*+/

But I can't get it to return the proper match.

Where did I go wrong in my lookaround?

I'm doing this in PHP (preg_match()), if that makes a difference.

Thanks!

+1  A: 

You could try

"/(?sm)^[^#](?:(?!(?:\r\n|\r|\n){2}).)*/"

I enable the multiline option by using (?sm) instead of (?s) and start each check at a new line, which may not be starting with a #. And I used \r\n|\r|\n instead of \r?\n because my testing environment had funny line breaks =)

Jens
Yes! Thank you so much! (And now off to learn more about (?sm). =)
Chike
@Chike: Look here: http://www.regular-expressions.info/modifiers.html
Jens