views:

141

answers:

4

Would somebody care to help me out with a preg_match_all Regex?

I need to extract from a block that looks like this:

(arbitrary data)
alt=BAUSTEIN^550^^transparent^transparent^null^null^(...base64 encoded data...) ^
(arbitrary data)
alt=BAUSTEIN^550^^transparent^transparent^null^null^(...base64 encoded data...) ^

all base64 encoded blocks. The rule is: There is always alt=BAUSTEIN followed by six columns of arbitrary data delimited by ^. The base64 encoded column is also delimited by ^

my current feeble attempt contains a lot of ([^\^].*) and won't match anything. Pointers much appreciated.

+2  A: 

Try this:

alt=BAUSTEIN(?:\^.*?){6}\^(?<base64>.*?)\^
Rubens Farias
Cheers Rubens, I found this regex the most elegant but somehow I couldn't get it to work, I don't know why. The first bracketed expression is beatiful, though, something to keep in mind.
Pekka
@Pekka: It's probably the .Net named capture (`?<base64>`).
Alix Axel
you're right, @Alix; you can to strip that `?<base64>` if you want, but I just read php >= 5.2.2 supports it: http://php.net/manual/en/function.preg-match-all.php
Rubens Farias
Nope, I stripped the base64 and replaced it by `(.*?)` (good to know that it's a valid expression though, thanks!) Still doesn't match (the result is empty). Strange, I can't see why.
Pekka
can you please update your question with a full example?
Rubens Farias
A: 

try this

$regex ="@^alt=@BAUSTEIN\^{2}[a-zA-Z]{1}\^[a-zA-Z]{1}\^(.*)"
streetparade
"six columns of arbitrary data"; a 0 in any will break your regex
Rubens Farias
+1  A: 

I don't understand your example very well, but would this do it?

alt=BAUSTEIN\^+(.+?)\^+(.+?)\^+(.+?)\^+(.+?)\^+(.+?)\^+(.+?)\^+

Or a more refined one:

^alt=BAUSTEIN\^+(.+?)\^+(.+?)\^+(.+?)\^+(.+?)\^+(.+?)\^+([0-9a-zA-Z+/=]+)\^+$
Alix Axel
The first one did it, thank you.
Pekka
No problem, but Ruben solution is much more simpler: `alt=BAUSTEIN(?:\^.*?){6}\^(.*?)\^`.
Alix Axel
+1  A: 

here's one way without regex. since you have distinct delimiters, you can use splitting approach.

$str= <<<A
(arbitrary data)
alt=BAUSTEIN^550^^transparent^transparent^null^null^(...base64 encoded data...) ^
(arbitrary data)
alt=BAUSTEIN^550^^transparent^transparent^null^null^(...base64 encoded data...)
A;

$s = explode("^",$str);
for($i=0;$i<count($s);$i++){
    #check for alt=BAUSTEIN , if yes, go 6 indices forward to get your stuff
    if ( strpos($s[$i] ,"alt=BAUSTEIN" ) !==FALSE){
        print $s[$i+7]."\n";
    }
}
ghostdog74
Cheers Ghostdog, a regex did it for me nicely in this case but this is definitely a valid solution.
Pekka