tags:

views:

41

answers:

3

How do I match a regex where a line break can happen anywhere?

For example, if I am trying to match "thousands of turtle eggs", it should match all the following cases. (or even the cases when line breaks are inside the words.)

Scientists have revealed that a mammoth effort to move *thousands of turtle eggs* from beaches around the Gulf of Mexico after the Deepwater Horizon oil spill may have saved almost 15,000 of the reptiles.   

Scientists have revealed that a mammoth effort to move *thousands 
of turtle eggs* from beaches around the Gulf of Mexico after the Deepwater Horizon oil spill may have saved almost 15,000 of the reptiles.

Scientists have revealed that a mammoth effort to move *thousands of 
turtle eggs* from beaches around the Gulf of Mexico after the Deepwater Horizon oil spill may have saved almost 15,000 of the reptiles.

Scientists have revealed that a mammoth effort to move *thousands of turtle 
eggs* from beaches around the Gulf of Mexico after the Deepwater Horizon oil spill may have saved almost 15,000 of the reptiles.
+2  A: 
  /thousands\s+of\s+turtle\s+eggs/

or this version to ensure thousands and eggs are not part of another word (like ...eggsbath)

  /\bthousands\s+of\s+turtle\s+eggs\b/
ring0
What about the cases where line break might occur within the words?
rest_day
Just add that possibility between the letters: `/\bt[\r\n]*h[\r\n]*o[\r\n]*...`
ring0
ok... was thinking if there were any other solutions. :)
rest_day
+1  A: 

You can use the flag "s" to match all line breaks.

If you use the regex "/reptiles..*?sc/gis" it will match "reptiles.sc"

You can try this link

This is a online regex editor

Arun P Johny
A: 

use the 's' switch. Then ^$ becomes start and end of the the whole string and newlines are considered white spaces. So as long as you use \s for matching the gaps between words. eg:

#thousands\sof\sturtle\seggs#si

Karthick
Sorry, but you're way off. Newlines are always considered whitespace, and `\s` always matches them. What the `s` switch does is allow the **dot** (`.`) to match newlines, which it doesn't do by default. This was originally (and unfortunately) called *single-line* mode, but it's more commonly known as *dot-matches-all* or *DOTALL* mode today. The `s` switch has no effect on the anchors (`^` and `$`). See the "Dot" and "Anchors" section of this tutorial for details: http://www.regular-expressions.info/tutorial.html
Alan Moore
thanks. I didn't know that, i just assumed it was the way i mentioned it. I'll look into it. :)
Karthick