views:

1505

answers:

5

Hi,

I have this in mind:

On each server: (they all are set up identically)

I place a load balancer in front of the servers and a replacement load balancer in case my primary load balancer goes down.

I use Terracotta to have the session information replicated between the servers. If a server goes down the user should be able to continue their work at another server, ideally as if nothing happened. What is left to "solve" (as I haven't actually tested this and for example do not know what I should use as a load balancer) is the database replication which is needed.

If a user interacts with the application and the database changes, then that change must be replicated to the database servers on the other server machines. How should I go about doing that? Should I use MySQL PostgreSQL or something else (which ideally is free as we have a limited budget)? Does the other things above sound sensible?

Clarification: I cluster to get high availability first and foremost and I want to be able to add servers and use them all at the same time to get high scalability.

A: 

Here's an idea. Read Theo Schlossnagle's book Salable Internet Architectures.

What you're proposing is not a the best idea.

Load balancers are expensive and not as valuable as they would appear. Use something simpler for distributing the load between your servers (something like Wackamole).

Rather than fool around with DB replication, spend your money on a reliable DB server separate from your front-end web servers. Do regular backups and in the very unlikely event of DB failure, get back running as quickly as possible from ordinary backups.

S.Lott
Having your SQL server fail is not as uncommon as you might think. For instance, I've seen my managed Linux server fail TWICE just for overflowing log files. In this scenario, if you are relying on one single SQL server, all of your web servers go down.
Adrian Grigore
@Adrian Grigore: simpler and cheaper to monitor and prevent overflowing log files than invest large amounts of time in complex (and error-prone) DB replication.
S.Lott
-1:As I would like to know WHY my solutions isn't a good idea, at least a hint and then I'll read the book. Actually I am looking for how I can replicate the data-base.I would like at least two database servers to feel secure. Load balancing can occur in many ways,I'll certainly check out Wackamole.
DeletedAccount
@S.Lott: Overflowing log files was just an example. My point is that there are LOTS of ways your SQL server can go down and you cannot prevent all of them. If you want reliability, you can't afford to rely on just one SQL server.
Adrian Grigore
@Kent: I'll try to clarify the reason why. It's more expensive than it needs to be. Load balancers are more expensive than better alternatives like wackamole. The cost of a load balancer doesn't match the value of a load balancer.
S.Lott
@Adrian Grigore: if "you cannot prevent all of them" then stop using SQL Server, use a database where you CAN manage all of the software reasons for failure. Use RAID arrays to make the hardware reliable.
S.Lott
"load balancers are more expensive" suggests the term "load balancer" is clearly defined, and I don't think it is. If I interpret you correctly you are basically saying "Never use more than one database server under any circumstance" and even with my limited knowledge, that sounds very wrong to me.
DeletedAccount
@Kent: database replication is complex and expensive -- many, many things can go wrong. wackamole is inexpensive and very simple. Very little can go wrong. Return on investment analysis suggests (1) use wackamole and (2) manage your single DB wisely.
S.Lott
@ S.LottWith all due respect, this comment:"use a database where you CAN manage all of the software reasons for failure." tells me that you really have no clue what high availability means or how to achieve it. I'm sorry.
Chris
@Chris: Oracle does this fine. DB2 has always worked well in HA situations. Single instance of DB2 with RAID storage seems to run without failure more-or-less indefinitely. Guess that was just luck, not managing to achieve High Availability.
S.Lott
@S.Lott: Pretty much luck. What happens if for some odd reason, your DB2 instance blows up and can't recover? What if the system doesn't boot after a power cycle (yes, those still happen every few years). With a backup instance, you're fine. In your case, you're hosed.
Chris
@Chris: a duplicate database (plus the machinery to replicate) is not a very useful backup. Real backups and RAID arrays are cheaper and simpler.
S.Lott
@S.Lott: There again, we have the misunderstanding of "High Availability". A replicated database instance is not about backup. It's about little to no downtime during the recovery process when your primary instance DOES fail.
Chris
@Chris: what does high availability have to do with this question? Also, with RAID arrays the primary database rarely goes down. When it does go down, it's not media; it's processor. A replacing processor using the existing RAID gets you back up with LESS complexity.
S.Lott
I'm clustering to achieve high availability (that's why I plan to use Terracotta for example) and scalability. Why did you think I wanted to use a cluster at all? I still think you are way off, I want redundancy.
DeletedAccount
@Kent: PLease update your question with this new fact -- not every clusters for HA. Some people cluster because it "seems" necessary. In your case, I think your time is better spent with Schlossnagle's book than trying to get replication to work.
S.Lott
I added a clarification. Out of curiosity, why do they think it's necessary? In my case the solution will be a critical part of other companies business, so I want as little down time as possible within a small companies budget.(If I feel we need outside expertice, I'll be sure to use that too.)
DeletedAccount
That is, outside expertice as in hiring a consultant. :-) Of course I'm getting outside advice here, right now.
DeletedAccount
@Kent: Clustering "seems" to give HA because it "seems" to make sense. As a practical matter, it isn't really necessary. Use RAID storage, own spare hardware. Do backups. Keep things VERY simple you can move to spare hardware quickly in the unlikely event of processor failure.
S.Lott
+1  A: 

For my (Perl-driven) website, I am using MySQL on two servers with database replication. Each MySQL server is slave and master at the same time. I did this for redudancy, not for performance, but the setup has worked fine for the past 3 years, we had almost no downtime at all during this period.

Regarding Kent's question / comment: I am using the standard replication that comes with MySQL.

Regarding the failover mechanism: I am using DNSMadeEasy.com's failover functionality. I have a Perl script run every 5 minutes via cron that checks if replication is still running (and also lots of other things such as server load, HDD sanity, RAM usage, etc.). During normal operation, the faster of the two servers delivers all web pages. If the script detects that something is wrong with the server (or if the server is just plain down), DNSMadeEasy switches DNS entries so that the secondary server becomes primary. Once the "real" primary server is back up, MySQL automatically catches up on missing database changes and DNSMadeEasy automatically switches back.

Adrian Grigore
Are you using the built in replication or a third party solution? How do you detect if a server goes down and how do you decide on which server to use from your application?
DeletedAccount
+3  A: 

Hi Kent,

Since you're already using Terracotta, and you believe that a second DB is a good idea (agreed), you might consider expanding Terracotta's role. We have customers who use Terracotta for database replication. Here's a brief example/description:

http://www.terracotta.org/web/display/orgsite/TCCS+Asynchronous+Data+Replication

Sorry for my confusing post. I'm not actually using Terracotta yet, I "have it in mind". :-) I'll take some time to read your link soon, it sounds very interesting!
DeletedAccount
+2  A: 

You are trying to create a multi-master replication, which is a very bad idea, as any change to any database has to replicate to every other database. This is terribly slow - on one server you can get several hundred transactions per second using a couple of fast disks and RAID1 or RAID10. It can be much more if you have a good RAID controller with battery backed cache. If you add the overhead of communicating with all your servers, you'll get at most tens of transactions per second.

If you want high availability you should go for a warm standby solution, where you have a server, which is replicated but not used - when main server dies a replacement takes over. You can lose some recent transactions if your main server dies.

You can also go for one master, multiple slaves asynchronous replication. Every change to a database will have to be performed on one master server. But you can have several slave, read-only servers. Data on this slave servers can be several transactions behind the master so you can also lose some recent transactions in case of server death.

PostgreSQL does have both types of replication - warm standby using log shipping and one master, multiple slaves using slony.

Only if you will have a very small number of writes you can go for synchronous replication. This can also be set for PostgreSQL using PgPool-II or Sequoia.

Please read High Availability, Load Balancing, and Replication chapter in Postgres documentation for more.

Tometzky
+1  A: 

AFAIK, MySQL does better job being scalable. See the documentation http://dev.mysql.com/doc/mysql-ha-scalability/en/ha-overview.html

And there is a blog, where you can take a look at real life examples: http://highscalability.com/tags/mysql

vartec