views:

216

answers:

4

Most of my application is written in PHP ((Front and Back ends). There is a part that works too slowly and I will need to rewrite it, probably not in PHP. What will give me the following:
1. Most speed
2. Fastest development
3. Easily maintained.

I have in my mind to rewrite this piece of code in CPP as a PHP extension, but may be I am locked on this solution and misses some simpler/better solutions?

The algorithm is PorterStemmerAlgorithm on several MB of data each time it is run.

+9  A: 

The answer really depends on what kind of process it is.

If it is a long running process (at least seconds) then perhaps an external program written in C++ would be super easy. It would not have the complexities of a PHP extension and it's stability would not affect PHP/apache. You could communicate over pipes, shared memory, or the sort...

If it is a short running process (measured in ms) then you will most likely need to write a PHP extension. That would allow it to be invoked VERY fast with almost no per-call overhead.

Another possibility is a custom server which listens on a Unix Domain Socket and will quickly respond to PHP when PHP asks for information. Then your per-call overhead is basically creating a socket (not bad). The server could be in any language (c, c++, python, erlang, etc...), and the client could be a 50 line PHP class that uses the socket_*() functions.


A lot of information needs evaluated before making this decision. PHP does not typically show slowdowns until you get into really tight loops or thousands of repeated function calls. In other words, the overhead of the HTTP request and network delays usually make PHP delays insignificant (unless the above applies)

  • Perhaps there is a better way to write it in PHP?
  • Are you database bound?
  • Is it CPU bound, Network bound, or IO bound?
  • Can the result be cached?
  • Does a library already exist which will do the heavy lifting.


By committing to a custom PHP extension, you add significantly to the base of knowledge required to maintain it (even above C++). But it is a great option when necessary.

Feel free to update your question with more details, and I'm sure Stack Overflow will be happy to help out.

gahooa
A: 

Am not sure about what the PorterStemmerAlgorithm is. However if you could make your process run in parallel and collect the information together , you could look at parallel running processes easily implemented in JAVA. Not sure how you could call it in PHP, but definitely maintainable.

You can have a look at this framework. Looks simple to implement

https://computefarm.dev.java.net/

Regards, Franklin.

Franklin
tnx-will look into
Itay Moav
+3  A: 

Suggestion

The PorterStemmerAlgorithm has a C implementation available at http://tartarus.org/~martin/PorterStemmer/c.txt

It should be an easy matter to tie this C program into your data sources and make it a stand alone executable. Then you could simply invoke it from PHP with one of the proc functions, such as proc_open()

Unless you need to invoke this program many times PER php request, then this approach should save you the effort of building and integrating a PHP extension, not to mention that the hard work (in c) is already done.

gahooa
This answers my direct problem, Although your first answer is a better generalized one :-)
Itay Moav
A: 

If you absolutely need to rewrite in a different language for speed reasons then I think gahooa's answer covers the options nicely. However, before you do, are you absolutely sure you've done everything you can to improve the performance if the PHP implementation?

  1. Is caching the output viable in your situation? Could you get away with running the algorithm once and caching the output rather than on every page load?
  2. Have you tried profiling the code to ensure there's no unnecessary work being done (db queries in an inner loop and the like). Xdebug can help here.
  3. Are there other stemming algorithms available which might perform better on your dataset?
Jim OHalloran
result is always cached. the algo is used only on new data every F(X time,Y new data)
Itay Moav