views:

103

answers:

1

Hello!

I've already got help here for creating a quotation extraction function. Thanks alot, soulmerge!

Now I'm looking for regular expressions (PHP) which extract the cited text and the cited person. The person should be in one index (substring), the text in another index (substring).

For English texts, soulmerge proposed these regexes:

  • /"(.*?)[,.]?\h*"\h*said\h*(.*?)\./
  • /"(.*?)\h*"(.*)said/
  • /\.\h*(.*)(once)?\h*said[\-]*"(.*?)"/

I would like to "translate" the following direct speech examples in German to regexes:

  • "This is a quotation", sagte PERSON ...
  • "This is a quotation!", sagte PERSON ...
  • "This is a quotation?", sagte PERSON ...
  • PERSON sagte: "This is a quotation."
  • PERSON sagte: "This is a quotation!"
  • PERSON sagte: "This is a quotation?"

Can someone help me to build the adequate regular expressions for these direct speech forms?

I hope you can help me. Thank you very much in advance!

+1  A: 
  • /"(.+)",\s*sagte\s+(.+)/
  • /(.+)\s+sagte:\s*"(.+)"/

Please note that the person and question variables are reversed in the second regex.

Yannick M.
Thank you, it works fine!