views:

809

answers:

10

I am using PHP with the Zend Framework and Database connects alone seem to take longer than the 0,02 seconds Google takes to do a query. The wierd thing today I watched a video that said Google connects to 1000 servers for a single query. With latency I would expect one server for every query to be more efficent than having multiple servers in different datacenters handeling stuff.

How do I get PHP, MySQL and the Zend Framework to work together and reach equal great speeds?

Is caching the only way? How do you optimize your code to take less time to "render".

+4  A: 

Awhile ago Google decided to put everything into RAM.

http://googlesystem.blogspot.com/2009/02/machines-search-results-google-query.html

If you never have to query the hard drive, your results will improve significantly. Caching helps because you don't query the hard drive as much, but you still do when there is a cache miss (Unless you mean caching with PHP, which means you only compile the PHP program when the source has been modified).

Coltin
how do I check if my db is in ram completly?
Thomaschaaf
Google is using their DB engine I would suspect. I don't think you can force MySQL to put the entire database in RAM.
Darryl Hein
You could always use MySQL cluster! That all goes to RAM with occasional checkpoints to disk. Pretty advanced to setup, though. You could also use all memory tables, but the would go 'poof!' if the server goes down.
jonstjohn
You could set up a caching solution on top of your DB, such as memcached or APC.
Chad Birch
It's important to note that Caching is a method for taking advantage of a small amount of fast memory using temporal and spatial locality. Placing everything into a faster medium (RAM as apposed to a hardrive, SRAM as apposed to DRAM) will eliminate the need for most caching needs in this instance.
Coltin
+9  A: 

There are many techniques that Google uses to achieve the amount of throughput it delivers. MapReduce, Google File System, BigTable are a few of those.

There are a few very good Free & Open Source alternatives to these, namely Apache Hadoop, Apache HBase and Hypertable. Yahoo! is using and promoting the Hadoop projects quite a lot and thus they are quite actively maintained.

Baishampayan Ghose
What good free alternatives are there?
Thomaschaaf
I have edited my answer to add some good alternatives.
Baishampayan Ghose
Thanks. (I have to write something here)
Thomaschaaf
+3  A: 

It really depends on what you are trying to do, but here are some examples:

  • Analyze your queries with explain. In your dev environment you can output your queries and execution time to the bottom of the page - reduce the number of queries and/or optimize those that are slow.

  • Use a caching layer. Looks like Zend can be memcache enabled. This can potentially greatly speed up your application by sending requests to the ultra-fast caching layer instead of the db.

  • Look at your front-end loading time. Use Yahoo's YSlow add-on to Firebug. Limit http requests, set far-future headers to cache js, css and images. Etc.

You can get lightning speeds on your web app, probably not as fast as google, if you optimize each layer of your application. Your db connect times are probably not the slowest part of your app.

jonstjohn
A: 

If it's for a search engine, the bottleneck is the database, depending of its size.

In order to speed-up search on full text on a large set, you can use Sphinx. It can be configured either on 1 or multiple servers. However, you will have to adapt existing querying code, as Sphinx runs as a search daemon (libs are available for most languages)

Julien Tartarin
A: 

According to the link supplied by @Coltin, google response times are in the region of .2 seconds, not .02 seconds. As long as your application has an efficient design, I believe you should be able to achieve that on a lot of platforms. Although I do not know PHP it would surpise me if .2 seconds is a problem.

krosenvold
+2  A: 

Memcached is a recommended solution for optimizing storage/retrieval in memory on Linux.

spoulson
+6  A: 

I am using PHP with the Zend Framework and Database connects alone seem to take longer than the 0,02 seconds Google takes to do a query.

Database connect operations are heavyweight no matter who you are: use a connection pool so that you don't have to initialise resources for every request.

Performance is about architecture not language.

Alabaster Codify
+1 for the truth that "Performance is about architecture not language". I hate these questions :P
Toby Hede
+1  A: 

Google have a massive, highly distributed system that incorporates a lot of proprietary technology (including their own hardware, and operating, file and database systems).

The question is like asking: "How can I make my car be a truck?" and essentially meaningless.

Toby Hede
Seems like a reasonable response to me - un-downvoted.
Dominic Rodger
A: 
  • APC code caching;
  • Zend_Cache with APC or Memcache backend;
  • CDN for the static files;
vartec
+2  A: 

PHP scripts by default are interpreted every time they are called by the http server, so every call initiates script parsing and probably compilation by the Zend Engine. You can get rid of this bottleneck by using script caching, like APC. It keeps the once compiled PHP script in memory/on disk and uses it for all subsequent requests. Gains are often significant, especially in PHP apps created with sophisticated frameworks like ZF.

Every request by default opens up a connection to the database, so you should use some kind of database connection pooling or persistent connections (which don't always work, depending on http server/php configuration). I have never tried, but maybe there's a way to use memcache to keep database connection handles.

You could also use memcache for keeping session data, if they're used on every request. Their persistence is not that important and memcache helps make it very fast.

The actual "problem" is that PHP works a bit different than other frameworks, because it works in a SSI (server-side includes) way - every request is handled by http server and if it requires running a PHP script, its interpreter is initialized and scripts loaded, parsed, compiled and run. This can be compared to getting into the car, starting the engine and going for 10 meters.

The other way is, let's say, an application-server way, in which the web application itself is handling the requests in its own loop, always sharing database connections and not initializing the runtime over and over. This solution gives much lower latency. This on the other hand can be compared to already being in a running car and using it to drive the same 10 meters. ;)

The above caching/precompiling and pooling solutions are the best in reducing the init overhead. PHP/MySQL is still a RDBMS-based solution though, and there's a good reason why BigTable is, well, just a big, sharded, massively distributed hashtable (a bit of oversimplification, I know) - read up on High Scalability.

macbirdie