ansaurus

Question

DB table getting too much data - need another solution

Answer 1

+13 A:

No No no, that is absolutely NOT the way to structure such a database. Comma-separated lists in varchar fields are the least desirable anti-pattern you should consider.

This sounds to me like your performance problems are based on guesswork. So instead:

Determine if there really is a problem
Find the cause of it using appropriate instrumentation
Test possible solutions in a non-production environment.

600k rows is NOTHING (in a table with three ints). Really. This can fit into ram on even the tiniest of servers. Querying a table out of ram should be so fast you don't worry about it.

If you get past step 1 (there really is a problem), ask further questions containing your entire relevant schema, exact queries, explain plans and timing data.

MarkR 2010-10-03 13:16:12

Thank you :) I guess I'll have to investigate further.

Frexuz 2010-10-03 16:52:13

Answer 2

+1 A:

Here's how I'd structure the tables:

USERS
userid INTEGER PRIMARY KEY 
username text/varchar/whatever

SHOWS
showid INTEGER PK
showname   varchar or nvarchar or text  [depending on what database I was using]
etc etc


EPISODES
episodeid INTEGER PK
showid    INTEGER  FK references SHOWS   [index this field]
ordinal   DECIMAL   [indicates which episode  -- DECIMAL makes it easier to insert later an episode you overlooked] 
episodename text/varchar/nvarchar whatever   
etc etc

SEENIT
id  INTEGER AUTOINCREMENT  PK
userid  INTEGER    foreign key ref USERS
episodeid  INTEGER foreign key ref EPISODES

You could place an alternate unique composite index on (userid, episodeid) or use separate indexes, one on userid, one on episodeid. I'd probably go with the latter.

Tim 2010-10-03 13:46:18

Answer 3

+1 A:

Whether you denormalize your data or not is a matter of debate. It can have its merits in specific circumstances, but from a relational point of view it probably shouldn't be your first choice. Instead, the preferred first steps in solving this problem should be to analyze it and implement solutions that don't change the structure of the data but predominantly deal with the database system and its environment. Therefore:

Is the source of your problem really the database ? Or is it some other system (network, webserver, rails, etc) ?
What is acceptable in terms of query response times ? Find concrete numbers that the database should adhere to under all circumstances.
Which queries are getting slower ? Maybe you have slow, inefficient queries that can be refactored. Make a query plan, see what the optimizer is doing.
Are you using indexes in the correct way ?
Tune your mysql instance. You can achieve a lot with tuning.
See that you can do something on the hardware side (get more memory, faster disks, etc)
Create views for the top-most used queries if there are any
If all of the above is done, you can still do sharding. This adds some complexity on top of your application but it will allow you to scale your system to a good extent without too much effort.
Eventually you may reach the conclusion that you must use a "truly scalable" distributed key/value store (nosql). But at 600k rows there is a long way to go until you reach this point.

That being said - if you find that your proposed solution is the best way to improve performance, go ahead and denormalize. The point is that you should be are aware of all options and choose the best ones with concrete performance-related goals in mind.

bunting 2010-10-03 13:47:26

Answer 4

A:

I would stick with the normalized data. It sounds more like a query optimization problem. Keep in mind that mysql (assuming you are using it) uses only one index per query and you might get better performance by setting up a composite index. Also make use of the EXPLAIN statement in the mysql query browser. More info here: http://dev.mysql.com/doc/refman/5.1/en/explain.html

Volker Pacher 2010-10-03 14:46:43

As of MySQL 5.0, multiple indexes can be used per query and the final result found using index merge. See: http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Martin 2010-10-03 17:09:53

my bad, totally forgot about that. Does it actually perform as well as a declared composite index?

Volker Pacher 2010-10-03 20:13:53

ansaurus

tags:

views:

answers:

DB table getting too much data - need another solution

related questions