Why do this in C#? In PHP this is trivial to do. Use the token_get_all()
function and it will break a PHP file into a stream of lexemes that you can use to definitively determine the list of classes and methods by writing a finite state machine.
Whatever you do don't try and do this with regular expressions. It will be incredibly tedious and error-prone.
Edit: There are three basic possibilities for doing this:
- Do it in PHP. This will be the fastest (to develop) and simplest option;
- Run a command line PHP script to either do this or generate a series of tokens that can be interpreted by a C# program. This is the next easiest;
- Use Phalanger, a port of PHP to the .Net framework. This might be more palatable to management since it's still all .Net code; or
- Use Quercus, a port of PHP to the Java VM.
Anything else will involve either writing a PHP parser (a lot of work) or using really flaky regular expressions that will be an unreliable support nightmare.
To be concerned about supposed "security flaws" of PHP has several problems:
- Any framework or technology stack can have security flaws. The fact that your sysadmin only allows .Net effectively under protest over Java just indicates irrational bias. I say this as a long-time Java developer: Java, .Net and PHP can all have security flaws;
- You can run PHP from the command line so it doesn't serve any HTTP requests, which diminishes the issue of security flaws to basically zero;
- If you're worried about internal security threats (from someone with access to the box) simply restrict the PHP CLI executable to only be executable by a group that only your program is in.