tags:

views:

195

answers:

7

I'm working on a website that utilizes essentially a database with a table of organizations, one row for each organization. Each organization can have an unlimited number of attached keywords. The keywords are represented in a separate table from the organizations where each row is simply the primary key, the keyword, and the primary key of the organization it is attached to. Eventually this table could have many thousands of entries. Will this making pulling records from this table, as well as listing unique keywords in the table, too time consuming?

+1  A: 

Many thousands of entries is not very many at all. Make sure to index on keywords if you need to retrieve specific ones.

danben
A: 

If you have to ask for the limits, you're doing it wrong. Although you're describing a Many-to-Many relationship, which is perfectly okay.

Just keep the indexes up to date and it shouldn't be a problem. I have a table in my current database that has 94k rows, and it's plenty fast enough.

A few recommendations to make it faster, if you know how long the keyword can be, and it's not too large, consider using CHAR instead of VARCHAR, you'll use more space, but a fully set byte length row is orders of magnitude faster than one that isn't. When executing queries, get only what you need, i.e. limit/top and where clauses, and don't use '*' in the selector unless you really need all the rows.

Malfist
A: 

Try: benchmark

Andreas Bonini
A: 

If your select query takes more than 1 minute, that's way too many records. That's really the only figure I use to gauge a good versus bad query. Look into the mystic Where clause

Woot4Moo
...........wtf?
Jason
I swear in my head that made sense. I was basing it on the question in the title. So I figured a time metric was an acceptable measure of how many records is too many.
Woot4Moo
+2  A: 

Having a couple of hundred thousands rows is perfectly fine, as long as :

  • they are indexed properly
  • and your queries are done properly (i.e. using the right indexes, for instance)

I'm working on an application that's doing lots of queries on several tables with a couple of hundred thousands records in each, with joins and not "simple" where clause, and that application is working fine -- well, since we've optimized the queries and indexes ^^


A couple of million rows, in those conditions, is OK too, I'd say -- depends on what kind of queries (and how many of those) you'll do ^^


In every case, there's only one way to know for sure :

  • You have to know what kind of queries you'll be doing,
  • You also have to have a large dataset to test,
  • And you have to benchmarking : launch the queries on your dataset, a lot of times, with concurrency, as if in "real conditions" -- and it'll help answer to the questions "will it handle the load ? do I have to optimize ? what are the bottlenecks ?"
Pascal MARTIN
I maintain a 90GB reporting database (for web server logs) which has several tables with 100s of millions of rows the biggest is 318m. I can get results from a standard select query with a join here and there (under moderate load) in 10 - 50ms.
Seth
A: 

The best way to see if it is more resource consuming than you would like is to do a benchmark. For this, you use the BENCHMARK() function on your expression.

BENCHMARK(1000000, (SELECT * FROM table))
kio
A: 

as already stated before mysql should not be the problem - I heard of tables with literally millions of rows causing no problem.. But be aware that php loops around mysql requests can be tricky, slow things down and even wind up your script with a 'timeout' error!

tillinberlin