views:

1246

answers:

12

In a web app that support more than 5000 users, postgres is becoming the bottle neck.

It takes more than 1 minute to add a new user.(even after optimizations and on Win 2k3)

So, as a design issue, which other DB's might be better?

+1  A: 

First, I would make sure the optimizations are, indeed, useful. For example, if you have many indexes, sometimes adding or modifying a record can take a long time. I know there are several big projects running over PostgreSQL, so take a look at this issue.

Pablo Marambio
+2  A: 

PostgreSQL scales better than most, if you are going to stay with a relational db, Oracle would be it. ODBMS scale better but they have their own issues, as in that it is closer to programming to set one up.
Yahoo uses PostgreSQL, that should tell you something about is scalability.

WolfmanDragon
What ODBMS solutions do you like?
DanielHonig
Yahoo also uses a ton of MySQL, and their Postgres (that I know of) is for data warehousing, which is a different beast (emphasis on processing power and data scale vs concurrency)
SquareCog
I only know ODBMS from a theoretical standpoint. Objectivity/DB has an excellent reputation, the largest database in America (Standford U.) uses it.
WolfmanDragon
Yahoo's main db is PostgreSQL.
WolfmanDragon
News to me.. got a cite, or do you know cause you work there? I wasn't aware they even have a "main" db.
SquareCog
Excuse me, multitasking, I should have said largest. no i don't work there.
WolfmanDragon
I think you are right on that -- but if it's the one I am thinking of, that was in the news recently, it's the DW I referred to above. It doesn't have to support high concurrency, just extremely large (peta-size) distributed datasets. It's also highly customized.
SquareCog
i read another article yesterday, I will go look it up, it said it was "live" data.
WolfmanDragon
The one I am talking about is this: http://perspectives.mvdirona.com/2008/05/23/PetascaleSQLDBAtYahoo.aspx .
SquareCog
WolfmanDragon
Same one They rewrote the internals extensively, and it's an "offline" data warehouse. It's huge, and very impressive, but it's optimized to a different kind of load than a web app. Point being there are different kinds of scalability. But we agree, Postgres should be fine for a mere 5K users.
SquareCog
Just got to glance at the article mvdirona article, looks good. break time over, back to work I go. Thanks
WolfmanDragon
+1  A: 

I'd suggest looking here for information on PostgreSQL's performance: http://enfranchisedmind.com/blog/2006/11/04/postgres-for-the-win

What version of PG are you running? As the releases have progressed, performance has improved greatly.

warren
we are running 8.3.3, yep have seen performance improvements in newer releases.
+4  A: 

Ithink your best choice is still PostgresSQL. Spend the time to make sure you have properly tuned your application. After your confident you have reached the limits of what can be done with tuning, start cacheing everything you can. After that, start think about moving to an asynchronous master slave setup...Also are you running OLAP type functionality on the same database your doing OLTP on?

DanielHonig
+38  A: 

Most likely, it's not PostgreSQL, it's your design. Changing shoes most likely will not make you a better dancer.

Do you know what is causing slowness? Is it contention, time to update indexes, seek times? Are all 5000 users trying to write to the user table at the same exact time as you are trying to insert 5001st user? That, I can believe can cause a problem. You might have to go with something tuned to handling extreme concurrency, like Oracle.

MySQL (I am told) can be optimized to do faster reads than PostgreSQL, but both are pretty ridiculously fast in terms of # transactions/sec they support, and it doesn't sound like that's your problem.


P.S. We were having a little discussion in the comments to a different answer -- do note that some of the biggest, storage-wise, databases in the world are implemented using Postgres (though they tend to tweak the internals of the engine). Postgres scales for data size extremely well, for concurrency better than most, and is very flexible in terms of what you can do with it.

I wish there was a better answer for you, 30 years after the technology was invented, we should be able to make users have less detailed knowledge of the system in order to have it run smoothly. But alas, extensive thinking and tweaking is required for all products I am aware of. I wonder if the creators of StackOverflow could share how they handled db concurrency and scalability? They are using SQLServer, I know that much.


P.P.S. So as chance would have it I slammed head-first into a concurrency problem in Oracle yesterday. I am not totally sure I have it right, not being a DBA, but what the guys explained was something like this: We had a large number of processes connecting to the DB and examining the system dictionary, which apparently forces a short lock on it, despite the fact that it's just a read. Parsing queries does the same thing.. so we had (on a multi-tera system with 1000s of objects) a lot of forced wait times because processes were locking each other out of the system. Our system dictionary was also excessively big because it contains a separate copy of all the information for each partition, of which there can be thousands per table. This is not really related to PostgreSQL, but the takeaway is -- in addition to checking your design, make sure your queries are using bind variables and getting reused, and pressure is minimal on shared resources.

SquareCog
I guess concurrency is not an issue, could be the tables are not normalized fully and the queries are not fully tuned. I will also look at tuning the db. Thanks for your post.
@vpsingh88: Note that table normalization != better performance (at least not automatically). There are plenty of cases where a side effect of normalization is a better design that leads to better performance, but too much normalization or poor normalization can hurt performance, too.
Michael Haren
A: 

If you do want to switch away from PostgreSQL, Sybase SQL Anywhere is number 5 in terms of price/performance on the TPC-C benchmark list. It's also the lowest price option (by far) on the top 10 list, and is the only non-Microsoft and non-Oracle entry.

It can easily scale to thousands of users and terabytes of data.

Full disclosure: I work on the SQL Anywhere development team.

Graeme Perrow
Graeme, I totally appreciate your expertise, but you might want to disclose bias.. Question: are SQL Anywhere and Sybase IQ related?
SquareCog
Good point. I added to my answer. Yes, Sybase IQ is basically the SQL Anywhere query execution engine with a different store (i.e. the way they store data is completely different from SQL Anywhere).
Graeme Perrow
+3  A: 

Let me introduce you to the simplest, most practical way to scale almost any database server if the database design is truly optimal: just double your ram for an instantaneous boost in performance. It's like magic.

Seun Osewa
+6  A: 

Please change the OS under which you run Postgres - the Windows port, though immensely useful for expanding the user base, is still not on a par with the (much older and more mature) Un*x ports (and especially the Linux one).

Milen A. Radev
A: 

We need more details: What version you are using? What is the memory usage of the server? Are you vacuuming the database? Your performance problems might be un-related to PostgreSQL.

Guy C
A: 

If you have many reads over writes, you may want to try MySQL assuming that the problem is with Postgres, but your problem is a write problem.

Still, you may want to look into your database design, and possibly consider sharding. For a really large database you may still have to look at the above 2 issues regardless.

You may also want to look at non-RDBMS database servers or document oriented like Mensia and CouchDB depending on the task at hand. No single tool will manage all tasks, so choose wisely.

Just out of curiosity, do you have any stored procedures that may be causing this delay?

partoa
+1  A: 

As highlighted above the problem is not with the particular database you are using, i.e. PostgreSQL but one of the following:

  • Schema design, maybe you need to add, remove, refine your indexes
  • Hardware maybe you are asking to much of your server - you said 5k users but then again very few of them are probably querying the db at the same time
  • Queries: perhaps poorly defined resulting in lots of inefficiency

A pragmatic way to find out what is happening is to analyse the PostgeSQL log files and find out what queries in terms of:

  • Most frequently executed
  • Longest running
  • etc. etc.

A quick review will tell you where to focus your efforts and you will most likely resolve your issues fairly quickly. There is no silver bullet, you have to do some homework but this will be small compared to changing your db vendor.

Good news ... there are lots of utilities to analayse your log files that are easy to use and produce easy to interpret results, here are two:

pgFouine - a PostgreSQL log analyzer (PHP)

PQA (ruby)

belvoir
A: 

Hi had the same issue previously with my current company. When I first joined them, their queries were huge and very slow. It takes 10 minutes to run them. I was able to optimize them to a few milliseconds or 1 to 2 seconds. I have learned many things during that time and I will share a few highlights in them.

  1. Check your query first. doing an inner join of all the tables that you need will always take sometime. One thing that I would suggest is always start off with the table with which you can actually cut your data to those that you need.

    e.g. SELECT * FROM (SELECT * FROM person WHERE person ilike '%abc') AS person;

If you look at the example above, this will cut your results to something that you know you need and you can refine them more by doing an inner join. This is one of the best way to speed up your query but there are more than one way to skin a cat. I cannot explain all of them here because there are just too many but from the example above, you just need to modify that to suite your need.

  1. It depends on your postgres version. Older postgres does internally optimize the query. on example is that on postgres 8.2 and below, IN statements are slower than 8.4's.

  2. EXPLAIN ANALYZE is your friend. if your query is running slow, do an explain analyze to determine which of it is causing the slowness.

  3. Vacuum your database. This will ensure that statistics on your database will almost match the actual result. Big difference in the statistics and actual will result on your query running slow.

  4. If all of these does not help you, try modifying your postgresql.conf. Increase the shared memory and try to experiment with the configuration to better suite your needs.

Hope this helps, but of course, these are just for postgres optimization.

btw. 5000 users are not much. My DB contains users with about 200k to a million users.

mezzie