tags:

views:

110

answers:

5

I want to get ${1} = Title, ${2} = Open, ${3} = Bla-bla-bla.

from

{{Title|Open
Bla-bla-bla 
}}
+3  A: 
$string = "{{Title|Open
Bla-bla-bla 
}}";

preg_match('/^\{\{([^|]+)\|(.*?)[\r\n]+(.*?)\s*\}\}/', $string, $matches);
print_r($matches);
joealba
+1  A: 

http://www.gskinner.com/RegExr/

a useful place to play around and learn regexes.

styts
+3  A: 

What about something like this :

$str = <<<STR
{{Title|Open
Bla-bla-bla 
}}
STR;

$matches = array();
if (preg_match("/^\{\{([^\|]+)\|([^\n]+)(.*)\}\}$/s", $str, $matches)) {
    var_dump($matches);
}

It'll get you :

array
  0 => string '{{Title|Open
Bla-bla-bla 
}}' (length=28)
  1 => string 'Title' (length=5)
  2 => string 'Open' (length=4)
  3 => string '
Bla-bla-bla 
' (length=14)

Which means that, after using trim on $matches[1], $matches[2], and $matches[3], you'll get what you asked for :-)


Explaining the regex :

  • matching from the beginning of the string : ^
  • two { characters, that have to be escaped, as they have a special meaning
  • anything that's not a |, at least one time : [^\|]+
    • between () so it's captured -- returned as the first part of the result
    • | has to be escaped too.
  • a | character -- that has to be escaped.
  • Anything that's not a line-break, at least one time : [^\n]+
    • between () so it's captured too -- second part of the result
  • .* virtually "anything" anynumber of times
    • between () so it's captured too -- third part of the result
  • and, finally, two } (escaped, too)
  • and an end of string : $

And note the regex has the s (dotall) modifier ; see Pattern Modifiers, about that.

Pascal MARTIN
+1 for explaining the regex in detail!
Adam Raney
A: 

@Bart K. About the last two }, I'd also escape them to prevent bugs.

Simon A. Eugster
I removed my answer since there are already more than enough correct answers in here. But why would not escaping `}` be prone to bugs? In most PCRE implementations I'm familiar with, the `}` is no special character.
Bart Kiers
Hm, seems like if I was wrong. I thought it would simplify debugging if you forget a closing } to an opening { you'd get an error message by the regex compiler (unless you've got some other }'s in there which are not escaped). But this is not the case, as I just noticed.
Simon A. Eugster
A: 

In Perl:

/\{\{         # literal opening braces
 (.*?)        # some characters except new line (lazy, i. e. as less as possible)
 \|           # literal pipe
 (.*?)        # same as 2 lines above
 \n           # new line
 ([\s\S]*?)   # any character, including new line (lazy)
 \}\}/x;      # literal closing braces

Making a more precise solution depends on what exact rules you want for extraction of your fields.

codeholic