ansaurus

Question

COUNT and GROUP BY on text fields seems slow

Answer 1

+3 A:

If your fields are only ever going to have 2 values, you're much better off making them booleans. You should also make everything NOT NULL unless there's a real reason you'll need it to be NULL.

Also take a look at the ENUM type for a better way to use a finite number of human-readable values for a column.

As for slowness, the first thing to try is to create indices on your columns. For the particular query you're showing here, an index on species, region should make a huge difference:

create index on mytablename (species, region);

should do it.

Vineet 2010-07-22 02:56:51

Are you sure the index will make a huge difference with such low-cardinality data?

Daniel Vassallo 2010-07-22 02:59:22

No, I'm not sure of it, but I think it's a good guess. I started writing some about using `EXPLAIN`, but it started to turn into a can of worms. And I guessed the end result would probably be that we should try creating an index anyway.

Vineet 2010-07-22 03:11:56

I tried the index, but it made no difference. I also tried VARCHAR as OMG Ponies suggested which was much faster. After that I tried it against enums with no noticeable speedup from VARCHARs.

Rich 2010-07-22 04:42:49

Also, I just checked out EXPLAIN; very cool!

Rich 2010-07-22 04:47:09

+1 for make it `NOT NULL`, -1 for saying `make it boolean`. Something I read in a SQL newsgroup years ago: "I usually use a 1 character flag. In my experience, when I initially think I have a binary status, it is really a multi-varied flag. In other words, as soon as I code to tell if the door is open or closed, someone else wants to know if it's locked."

onedaywhen 2010-07-22 08:05:59

Answer 2

+5 A:

Why're all your string based columns defined as TEXT? If you read the performance comparison, you'll see that TEXT was ~3x slower than a VARCHAR column using identical indexing: http://forums.mysql.com/read.php?24,105964,105964

OMG Ponies 2010-07-22 03:12:59

Good catch. Didn't notice they were `text`.

Daniel Vassallo 2010-07-22 03:15:50

I did TEXT because a colleague of mine said there wouldn't be any difference between that and VARCHAR. :) Using a VARCHAR took my runtime from 33 seconds to 2.5.

Rich 2010-07-22 04:43:45

@Rich: Wow - wasn't expecting such a dramatic difference. You might get lower if you changed the species and region columns to be foreign keys to tables holding their respective values. An INT is always 4 bytes, while a VARCHAR(4) is 5 so you can imagine how many bytes VARCHAR(100) is.

OMG Ponies 2010-07-22 15:34:24

ansaurus

tags:

views:

answers:

COUNT and GROUP BY on text fields seems slow

related questions