tags:

views:

58

answers:

4
+1  Q: 

PHP Regexp help

I can't seem to get a handle on what this expression intends to extract:

preg_match("/^(?:[\s\*]*?@([^\*\/]+?)\s(.+))/",$line,$match);

$line is a line from a text file while $match is an array

+6  A: 

Here's an explanation:

^               # match the beginning of the input
(?:             # start non-capture group 1 
  [\s*]*?       #   match any character from the set {'0x09'..'0x0D', '0x20', '*'} and repeat it zero or more times, reluctantly
  @             #   match the character '@'
  (             #   start capture group 1
    [^*/]+?     #     match any character from the set {'0x00'..')', '+'..'.', '0'..'ÿ'} and repeat it one or more times, reluctantly
  )             #   end capture group 1
  \s            #   match a whitespace character: [ \t\n\x0B\f\r]
  (             #   start capture group 2
    .+          #     match any character except line breaks and repeat it one or more times
  )             #   end capture group 2
)               # end capture group 1

An example string that the regex would match is this: * * *@abc asd

Edit:

I've released a beta version of the parser that was used to generate the explanation above. It can be downloaded here: http://big-o.nl/apps/pcreparser/pcre/PCREParser.html

Bart Kiers
Whoa, do you have a tool to auto-generate that? :-)
Joey
Yes, I wrote a PCRE grammar and used ANTLR to create a PCRE parser/lexer that I used to create such a *regex-explanation*.
Bart Kiers
That's absolutely amazing! Do you have this tool available for public use?
kalengi
Not yet. I recently finished development, but proper documentation (and unit tests) are running behind, as is almost always the case with my pet projects. If you want, I can send you an e-mail when I *do* make it available (should be within a couple of weeks). If so, drop me a line on my throw-away account `prometheuzz AT gmail DOT com`
Bart Kiers
Thanks for the parser!
kalengi
A: 

This will match strings of the form

** *  ***@anything_that_is_not_an_asterisk_nor_a_slash   anything else

$match[1] contains "anything_that_is_not_an_asterisk_nor_a_slash" before the first space, $match[2] contains " anything else".

KennyTM
This is a good plain English explanation. The code I got the regexp from is actually trying to extract exactly that kind of string, but is messing since it's picking CRLF in $match[2]
kalengi
A: 

the @ make me think the pattern is trying to capture element of an email... as a ROT always document the regex.

Dyno Fu
I thought so too at first, but it matches much, much more. See Kenny's and my answers.
Bart Kiers
<[email protected] User Name>, i think the meaning really depends on the input/context, what is the lines look like?
Dyno Fu
+2  A: 

Probably tries to capture lines of comment blocks like these (excluding first and last line):

/**
 * @param  $arg1 etc...
 * @return bool etc...
 */
fireeyedboy
This is very much like the source text. What I was wondering is what it's trying to pick in a step by step manner so I can see what it's messing.
kalengi