ansaurus

Question

How do I match one letter or many in a PHP preg_split style regex

Answer 1

+9 A:

In your case, it's better to use preg_match with its additional parameter and parenthesis:

preg_match("#((?:<|&lt;)%)([\s]*(?:[^ø]*)[\s]*?)(%(?:>|&gt;))#i",$markup, $out);
print_r($out);

Array
(
    [0] => <% your stuff %>
    [1] => <%
    [2] => your stuff
    [3] => %>
)

By the way, check this online tool to debug PHP regexp, it's so useful !

http://regex.larsolavtorvik.com/

EDIT : I hacked the regexp a bit so it's faster. Tested it, it works :-)

Now let's explain all that stuff :

preg_match will store everything he captures in the var passed as third param (here $out)
if preg_match matches something, it will be store in $out[0]
anything that is inside () but not (?:) in the pattern will be stored in $out

The patten in details :

#((?:<|&lt;)%)([\s]*(?:[^ø]*)[\s]*?)(%(?:>|&gt;))#i can be viewed as ((?:<|&lt;)%) + ([\s]*(?:[^ø]*)[\s]*?) + (%(?:>|&gt;)).

((?:<|&lt;)%) is capturing < or &lt; then %
(%(?:>|&gt;)) is capturing % then < or &gt; 
([\s]*(?:[^ø]*)[\s]*?) means 0 or more spaces, then 0 or more times anything that is not the ø symbol, the 0 or more spaces.

Why do we use [^ø] instead of . ? It's because . is very time consuming, the regexp engine will check among all the existing characters. [^ø] just check if the char is not ø. Nobody uses ø, it's an international money symbol, but if you care, you can replace it by chr(7) wich is the shell bell char that's obviously will never be typed in a web page.

EDIT2 : I just read your edit about capturing all the matches. In that case, you´ll use preg_match_all the same way.

e-satis 2008-09-19 18:25:04

Answer 2

A:

One possible solution is to use the extra parens, like so, but to ditch those in the results, so you actually only use 1/2 of the total restults.

this regex

$matches = preg_split("/(<|&lt;)%[\s]*(.*?)[\s]*%(>|&gt;)/i",$markup,-1,(PREG_SPLIT_NO_EMPTY  |  PREG_SPLIT_DELIM_CAPTURE));

for input

Hi my name is <h1>Issac</h1><% some stuff %>here&lt;% more stuff %&gt;

output would be

Array(
 [0]=>Hi my name is <h1>Issac</h1>
 [1]=><
 [2]=>some stuff
 [3]=>>
 [4]=>here
 [5]=>&;lt;
 [6]=>more stuff
 [7]=>&gt;
)

Which would give the desired resutls, if I only used the even numbers

Issac Kelly 2008-09-19 18:27:24

Answer 3

+1 A:

Why are you using preg_split if what you really want is what matches inside the parentheses? Seems like it would be simpler to just use preg_match.

It's often an issue with regex that parens are used both for grouping your logic and for capturing patterns.

According to the PHP doc on regex syntax,

The fact that plain parentheses fulfil two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns.

Tegan Mulholland 2008-09-19 18:28:32

Answer 4

+2 A:

<?php
$code = 'Here is a <% test %> and &lt;% another test %&gt; for you';
preg_match_all('/(<|&lt;)%\s*(.*?)\s*%(>|&gt;)/', $code, $matches);
print_r($matches[2]);
?>

Result:

Array
(
    [0] => test
    [1] => another test
)

_Lasar 2008-09-19 18:32:48

Answer 5

+1 A:

If you want to match give preg_match_all a shot with a regular expression like this:

preg_match_all('/((\<\%)(\s)(.*?)(\s)(\%>))/i', '<% wtf %> <% sadfdsafds %>', $result);

This results in a match of just about everything under the sun. You can add/remove parens to match more/less:

Array ( [0] => Array ( [0] => <% wtf %> [1] => <% sadfdsafds %> )

[1] => Array
    (
        [0] => <% wtf %>
        [1] => <% sadfdsafds %>
    )

[2] => Array
    (
        [0] => <%
        [1] => <%
    )

[3] => Array
    (
        [0] =>  
        [1] =>  
    )

[4] => Array
    (
        [0] => wtf
        [1] => sadfdsafds
    )

[5] => Array
    (
        [0] =>  
        [1] =>  
    )

[6] => Array
    (
        [0] => %>
        [1] => %>
    )

)

2008-09-19 18:41:24

ansaurus

tags:

views:

answers:

How do I match one letter or many in a PHP preg_split style regex

related questions