tags:

views:

51

answers:

5

I wanna catch Php classes from a file:

class a {
   function test() { }
}

class b extends a {
   function test() { }
}

and the result matches must be

class a {
   function test() { }
}

and

class b extends a {
   function test() { }
}
A: 

The next Regex worked for now:

^(?:(public|protected|private|abstract)\s+)?class\s+([a-z0-9_]+)(?:\s+extends\s+([a-z0-9_]+))?(?:\s+implements\s+([a-z0-9_]+))?.+?{.+?^}

Needs:

case insensitive | ^$ match at line breaks | dot matches new lines

This only works if "class" and the last "}" don't have indent.

Wiliam
`.+?` will make it fail if there's a string or comment with an `}` inside of it. At the very last, make it greedy or add something like `\s*$` to the end of your regex.
Bart Kiers
+5  A: 

regexps are poor at parsing programming languages' grammars. Consider tokenizer functions instead. e.g. http://php.net/manual/en/function.token-get-all.php see also this http://framework.zend.com/apidoc/core/Zend_Reflection/Zend_Reflection_File.html

stereofrog
@Wiliam, an example of `token_get_all($source)` can be found here: http://stackoverflow.com/questions/2217839/regex-removing-methods-from-code
Bart Kiers
@Bart K., it's for an small C# app to separe my classes into individual files. This is a good response, but not the one I'm searching, the last regex I posted here as answer works for me but I'm going to mark your response as the good one, thanks.
Wiliam
@Wiliam, note that it's not my answer but `stereofrog`'s.
Bart Kiers
A: 

A single regex won't do this. PHP is a more complex language than regex (insert something about context-free and regular grammars here). It'll drive you crazy to even try, unless you alter your source code to make it easier for the regex to match.

cHao
+1  A: 

Here's what you should use:

http://www.php.net/manual/en/function.token-get-all.php

quantumSoup
+1  A: 

Use token_get_all to get the array of language tokens of the PHP code. Then iterate it and look for a token with the value of T_CLASS that represents the class key word (this does not take abstract classes or the visibility into account). The next T_STRING token is the name of the class. Then look for the next plain token that’s value is {, increase a counter for the block depth and decrease it with every plain } token until visited the same amount of closing braces as opening braces (your counter is then 0). Then you have walked the whole class declaration.

Gumbo