views:

714

answers:

3

I am trying to evaluate Open Source options to replace my current CMS based publication application. My current CMS has about 12000 HTML pages and about 100000 uploaded files. The size of the data is about 20 Gigabytes. Drupal, Joomla and Plone seem interesting. However, I am concerned if these are ready to take on all this data. Do you know any large scale (comparably sized) CMS deployments - any supporting numbers will greatly help.

Please not that my CMS application is a publishing system and not a collaborative/social network type site.

+2  A: 

Drupal, in particular, focuses on performance. It has several types of internal caches, and, combined with a PHP cache (such as APC, which I use on my sites), it is quite performant. As of Drupal 6.0 the menu system (which drives the whole page-request structure) was totally rewritten for optimization purposes.

My largest Drupal community has about 800 users, about 1300 content pages, and a couple thousand uploaded files totaling around 3 GB, and experiences sub-200ms page loads. It's about 1/10 the size of your site, but since you don't need community features (which generally require a lot of custom database queries), you should experience comparable performance.

Drupal's home site, drupal.org, has about 430000 users, and about 400000 pages, and gets similar page load times (although they're running a cluster of servers).

So I'm pretty confident Drupal should be able to handle your site.

smokris
+5  A: 

fastcompany.com launched with ~750,000 pieces of content on day 1. They had performance and scaling problems initially, but it was related specifically to the fact that large-scale faceted search of the entire content base turned out to be the most popular feature, and they weren't using a dedicated search indexing system.

The New York Observer converted to Drupal a while ago, and their scaling problem had nothing to do with the amount of content; it was straightforward "how to handle Drudge and the Huffington Post both linking to you at the same time during the election season"

The Onion, Lifetime Television, and a number of other pretty large sites use Drupal. Mother Jones magazine just converted to it. NowPublic.com, the crowdsourced news site, also runs on Drupal and has been since the (much slower) days of Drupal 4.7.

The key scaling issue is not really how many discrete pieces of content you have, but rather the kind of slicing and dicing you'll be doing with your queries. Those are optimized ad-hoc, like any other SQL query. Drupal tends to focus on optimising for small to medium sites out of the box, and the larger stuff requires prodding around at the indexes and paying attention to how you build your Views-based pages (since they're basically just presentation logic wrapped around SQL).

As an earlier poster noted, if you don't need lots of user-customized content ('stuff my friends have posted,' 'what my buddies are doing,' etc.) the amount of expensive querying drops dramatically.

Eaton
+2  A: 

I got to put a plug in for plone. I use it as a document repository which contains lots and lots of scanned images that are quite large. No problems so far but not yet the size that you are talking about.

  • Plone has an FTP based interface so that might ease your migration pains.
  • Plone is written on top of an application server technology known as Zope. Because of that, plone's default back end is the Zope Object Data Base or ZODB. You can substitute a RDBMS for ZODB.
  • You can reconfigure ZODB to be a database that is distributed across multiple servers. This is called ZEO.
  • There is also work in progress for a file based repository system for plone.

There are lots of consulting companies who can give you the stats you are looking for. Here's the only case study that I could easily google.

Glenn