views:

29

answers:

2

Hi,

i'm using

preg_match_all('/<?(.*)?>/', $bigString, $matches, PREG_OFFSET_CAPTURE);

to find the contents of everything between <? and ?>

Now I'd like to find everything that is NOT between <? and ?>

I'm trying with

preg_match_all('/^(<?(.*)?>)/', $bigString, $nonmatches, PREG_OFFSET_CAPTURE);

but that doesn't seem to work...

A: 

non regex approach

$str=<<<EOF
1 some words
1 some more words
<?
blah blah
blah blah
?>
2 some words
2 some words <?
jdf
sdf ?>
asdf
sdfs
EOF;

$s = explode('?>',$str);
foreach($s as $v){
  $m = strpos($v,'<?');
  if($m!==FALSE){
     print substr($v,0,$m)."\n";
  }
}
print end($s);

output

$ php test.php
1 some words
1 some more words


2 some words
2 some words

asdf
sdfs
ghostdog74
Hi, this approach works, but there are a few little details that are not solved.The text after the last ?> is not found, i can easily solve this myself with an extra if. But i'd also like to have the starting position of the found string in the original string. In my example in the original question I use PREG_OFFSET_CAPTURE to achieve this.With the non-regex approach, can i somehow get the starting position of the found string?
murze
you can always do a `preg_match` on `<?` with offset capture
ghostdog74
A: 

Well, there are multiple approaches to this issue. One way is to capture the items you want to exclude, find their offsets and lengths and basically just extract those parts out from the original string and all you're left with are the parts outside the tags.

Here is a function as an example:

<?php

function match_all_except ($pattern, $string)
{
    preg_match_all($pattern, $string, $match, PREG_OFFSET_CAPTURE);

    $parts = array();
    $pos = 0;

    foreach ($match[0] as $info)
    {
        $parts[] = substr($string, $pos, $info[1] - $pos);
        $pos = $info[1] + strlen($info[0]);
    }

    $parts[] = substr($string, $pos);

    return $parts;
}

$string = 'one<? foo ?>two<? bar ?>three';
$parts = match_all_except('/<\?.*?\?>/s', $string);

// Will output "one, two, three, "
foreach ($parts as $outside)
{
    echo "$outside, ";
}

?>

Alternatively, you can use this regular expression /\G(?=.)((?:(?!<\?).)*)(?:<\?((?!\?>).)*(\?>|$)|$)/s in preg_match_all to capture all the parts outside the tags into the sub pattern one. Although, it may have it's own difficulties, if the tags are not evenly matched in the document.

For example,

<?php

$string = 'one<? foo ?>two<? bar ?>three';
preg_match_all('/\G(?=.)((?:(?!<\?).)*)(?:<\?((?!\?>).)*(\?>|$)|$)/s', $string, $match);

// Will output "one, two, three, "
foreach ($match[1] as $outside)
{
    echo "$outside, ";
}

?>
Rithiur
Hi, the regex solution gives an internal server error on my server, but the function solution works perfectly for me! Thanks!
murze