tags:

views:

806

answers:

1

For folks who have deployed HBase on their own clusters, do you feel that it's sufficiently stable for production use? What types of troubles or issues have you run into?

I do see a bunch of companies listed as using HBase in production (http://wiki.apache.org/hadoop/Hbase/PoweredBy), but I'm curious as to whether a lot of maintenance, patching, and firedrills goes into keeping the HBase cluster up and running.

+7  A: 

HBase is about to hit a major milestone with HBase-0.20. There's is an alpha and soon to be a RC. It has had very major performance improvements. StumbleUpon reportedly serve their site live out the trunk version of HBase, with no additional caching layer, as do others. So I'd say it's definitely ready for production use.

Ryan Rawson (of StumbleUpon) gave a nice talk on it at the nosql conference recently, which mostly is about how far it's come over the last 6 months. There are slides if you don't want to watch the whole thing. Apart from performance improvements the other major addition is it integrates with zookeeper now, so the master isn't a single point of failure anymore.

HBase used to fall over with small cell sizes with memory issues because of a limitation of the file format. This has been addressed too with a new custom file format, which also gave performance gains.

I've been experimenting with HBase for about a year now, I'm ready to trust 0.20 with a production service, I wasn't quite with older versions. I recommended at least a 4 or 5 node devcluster when experimenting.

I can't really comment on what it's like care-taking a production cluster, because we only just started with a production one. An aspect that helps is the mailing list is extremely active and irc is in constant use so there's a very strong community for helping out at least.

Tim