tags:

views:

705

answers:

3

Hello there,

I've got a requirement for parsing PHP files in C#. We essentially require some of the devs in another country to upload PHP files and once uploaded we need to check the php files and get a list of all the methods and classes/functions etc.

I thought of using a regex but I can't workout if a function belongs to a class etc, so I was wondering if theres already something 'out there' that will parse out PHP files and spit out its functions (I'm trying to avoid writing a full blow AST implementation).

Does anyone have any idea? I looked at Coco/R but I couldn't find a PHP grammar file. I'm using .NET 2.0 and C#.

+9  A: 

Why do this in C#? In PHP this is trivial to do. Use the token_get_all() function and it will break a PHP file into a stream of lexemes that you can use to definitively determine the list of classes and methods by writing a finite state machine.

Whatever you do don't try and do this with regular expressions. It will be incredibly tedious and error-prone.

Edit: There are three basic possibilities for doing this:

  1. Do it in PHP. This will be the fastest (to develop) and simplest option;
  2. Run a command line PHP script to either do this or generate a series of tokens that can be interpreted by a C# program. This is the next easiest;
  3. Use Phalanger, a port of PHP to the .Net framework. This might be more palatable to management since it's still all .Net code; or
  4. Use Quercus, a port of PHP to the Java VM.

Anything else will involve either writing a PHP parser (a lot of work) or using really flaky regular expressions that will be an unreliable support nightmare.

To be concerned about supposed "security flaws" of PHP has several problems:

  1. Any framework or technology stack can have security flaws. The fact that your sysadmin only allows .Net effectively under protest over Java just indicates irrational bias. I say this as a long-time Java developer: Java, .Net and PHP can all have security flaws;
  2. You can run PHP from the command line so it doesn't serve any HTTP requests, which diminishes the issue of security flaws to basically zero;
  3. If you're worried about internal security threats (from someone with access to the box) simply restrict the PHP CLI executable to only be executable by a group that only your program is in.
cletus
Hi,Thanks for the reply, unfortunately the product is written in C#, just uploads files to a webserver but we have to document the functions being uploaded. We can't run PHP locally either which sucks :(
why can't you run PHP locally? you don't need webserver, just feed your script into PHP.exe or whatever it is and capture console output into your C# application.
lubos hasko
Our sysadmin is anal about running PHP. If its not Java (and only-because-management-has-the-final-say) .NET, it doesn't run on our production boxes. He's worried about security flaws in PHP.
I've edited the answer to reflect another option: PHP.NET (a port of PHP to the .Net platform). Your sysadmin is a tool because you don't have to run PHP as or in a Webserver to serve pages. You can use it on the command line like Perl or a shell script, at which point any misplaced concerns about security flaws become a non-issue.
cletus
A: 

You might be able to use ctags for your purpose. I'm not sure how you would integrate it with C# though, since ctags is written in C.

Alternatively, if you know your parsers, you can take a look at the grammar files in the PHP source. In particular zend_ini_parser.y and zend_language_parser.y.

Finally, while not the best solution, you could probably get away with a home brewed handful of regular expressions. PHP's grammar is fairly strict with regards to classes and functions. You just need to keep track of a little bit of state, so you know which class a function belongs to.

troelskn
A: 

Puedes usar PHP4Delphi y crear una dll que llame a las funciones token_get_all() y esta dll realizada en delphi ser llamada desde c#.

Justamente, yo tambien necesito hacer lo mismo que indicas y estoy evaluando la posibilidad de hacerlo segun indico.

Alfredo