views:

465

answers:

7

I'm looking forward to hear some suggestions for choosing the "right" language (and modules?) to realize a 1-man web project (LAMP only, complexity somewhere between a guestbook and a fully fledged blog, developed for high traffic sites with 50,000+ impressions per day) based on these requirements:

  • Output cache (think: Wordpress Super-Cache et al), bypassing the script language completely when a cached page is present. This one is really important.

  • "Website spider" module to visit websites and parse HTML, Javascript support would be an advantage

  • Object oriented handling of multimedia files (mp3, jpg/gif/png, flv/wmv) without writing my own wrappers for everything

  • Possibility to encode the raw script files into something (binary?) that cannot easily be tampered with

For MVC compliance I'm already looking into frameworks like Catalyst. My only gripe from what I've read is that it seems to require its own server application. Perhaps PHP equivalents have a better way to launch apps...?

When answering, don't just say "PHP/Perl can do all this and more" but please provide a little recommendation for each of these points.

Thanks in advance

+12  A: 

Output cache: Both languages have excellent caching solutions.

Website spider: Perl has WWW::Mechanize, the best such module I've ever seen.

Object oriented handling of multimedia files: Perl has an advantage there. CPAN has modules for all kinds of multimedia files. PHP seems to be missing support for video formats in particular.

Encoding: both have to be interpretable to be executed. I know both have solutions to obfuscate them, but there are also deobfuscation tools. I would recommend to just drop that, it mostly makes your own life more difficult.

P.S. Catalyst does not need its own webserver, though it does offer a special server for development purposes. Normally you would deploy it using Apache or another FastCGI supporting webserver.

Leon Timmermans
+2  A: 

I think PHP would easily be able to cope with such a project, like Leon said its support for multimedia might not be quite as good as perl but for the rest of the stuff it should definitely be good.

Plenty of different caching API's and frameworks for PHP, personally I could recommend CakePHP as being a very good framework however it is quite bulky and a bespoke solution may be better.

The best website spider in PHP that I have used is Sphider.

Not sure about handling video and audio files in PHP but for images ImageMagick is very good.

Plenty of stuff in php as well for encoding check out the manual for that stuff.

Mark Davidson
+8  A: 

I would recommend against WWW::Mechanize as the crawler and use Gungho instead.

As for caching, it tends to be app dependent (unless it's just "cache this page for 10 minutes"). You are going to have a much easier time implementing it with an MVC Perl app than with PHP. Take a look at my somewhat-clever Angerwhale::Cache. You'll note that I trade speed for accuracy here -- an outdated page will never be served. You might be willing to be more liberal, and if so, you can cut the app out of many requests. (By, for example, running something every 5 minutes to update static HTML pages.)

jrockway
I didn't know Gungho, but it looks interesting indeed. I'll remember it next time I need to spider, thanks!
Leon Timmermans
Gungho seems to provide a very powerful crawler. With Mechanize, I was concerned about scalability and resource usage in a threaded environment. If it turns out to be a perl project, I will keep this in mind. Thanks!
DMeier
+4  A: 

Catalyst will easily do all the things you require. If you use the M and the V properly and keep the C to a minimum ( as encouraged by the Catalyst community) there's no reason for you not to get the scalability you require.

I don't know what you mean by "it seems to require its own server application". Deploying a Catalyst application can be as simple as

CATALYST_ENGINE=HTTP:Prefork script/myapp_server.pl -p 80

but it obviously supports other options as well.

singingfish
Since I want this to be a LAMP app for obvious reasons, my only chances are CGI(slow),mod_perl or FastCGI (not running on most virtual hosting packages) from what I understand.
DMeier
Fastcgi is available quite a few places. Dreamhost, asmallorange.com are two for example. I would imagine Textdrive would also be ok
singingfish
+1  A: 
  • PHP source obfuscation can be done with IONCube, but it needs a corresponding module on a server (which is NOT hosting-friendly).
  • For Perl, I played a few years ago with perl compilers which created Pcode or linked the sources with a interpreter in a standalone package.
  • The output cache you describe sounds like a reverse proxy.
  • PHP has no movie/audio support what so ever. You should resort to external tools like FFMpeg or MEncoder to do back-end processing or file statistics.
Wimmer
+4  A: 

With Perl, you can use PAR to package your application. This can simplify deployment and enables the use of PAR::Filter modules to obfuscate it. There is also Apache::PAR to integrate PAR with a mod_perl environment.

Update: Well, as it says on the website, PAR is pure-perl (no C compilation needed), so you can install it on any server which you can write files. You can load modules from a PAR file in your normal perl scripts like so:

 use PAR;
 use lib "foo.par";             # the .par part is optional
 use Hello;                     # module from the par file.

Put the bulk of your logic in your modules and use a simple launcher script to load them from the PAR file.

But, if you plan on building a big, complex app, do yourself a favor and get a host that will either let you install modules or will install them for you. Also, use mod_perl or FastCgi to speed up your app.

GrokThis has good plans cheap. There are other good hosts out there as well.

daotoad
I take it PAR compressed packages cannot be run on a regular hosting package then (Perl via CGI)? That's where PHP has an advantage IMO.
DMeier
+2  A: 

Check out the MVC frameworks Symfony and Cake for PHP. Widely used.

However keep in mind that 50,000 impressions sounds like a lot but it isn't that much. You probably do not need to worry about cacheing until you really start generating heavy traffic with concurrent users.

Everything else has pretty much been mentioned. If your server is locked down your source code should never EVER be altered - so unless you are distributing a retail application I do not see why you would need to deploy source obscurity. You will just be adding more steps to implementation/deployment and interpretation for your site/software.

Syntax