views:

51

answers:

1

I have a denormalized table product with about 6 million rows (~ 2GB) mainly for lookups. Fields include price, color, unitprice, weight, ...

I have BTREE indexes on color etc. Queriy conditions are dynamically generated from the Web, such as

select count(*) from product where color=1 and price > 5 and price <100 and weight > 30 ... etc

and

select * from product where color=2 and price > 35 and unitprice <110 order by weight limit 25;

I used to use InnoDB and tried MEMORY tables, and switched to NDB hoping more concurrent queries can be done faster. I have 2 tables with the same schema, indexes, and data. One is InnoDB while the other is NDB. But the results are very disappointing:for the queries mentioned above, InnoDB is like 50 times faster than NDB. It's like 0.8 seocond vs 40 seconds. For this test I was running only a single select query repeatedbly. Both InnoDB and NDB queries are using the same index on color.

I am using mysql-5.1.47 ndb-7.1.5 on a dual Xeon 5506 (8 cores total), 32GB memory running CentOS 5. I set up 2 NDB Data nodes, one MGM node and one MYSQL node on the same box. For each node I allocated like 9GB memory, and also tried MaxNoOfExecutionThreads=8, LockPagesInMainMemory, LockExecuteThreadToCPU and many other config parameters, but no luck. While NDB is running the query, my peak CPU load was only like 200%, i.e., only 2 out of 8 cores were busy. Most of the time it was like 100%. I was using ndbmtd, and verified in the data node log and the LQH threads were indeed spawned. I also tried explain, profiling -- it just showing that Sending data was consuming most of the time. I also went thru some Mysql Cluster tuning documents available online, not very helpful in my case.

Anybody can shed some light on this? Is there any better way to tune an NDB database? Appreciate it!

A: 

You need to pick the right storage engine for your application.

myISAM -- read frequently / write infrequently. Ideal for data lookups in big tables. Does reasonably well with complex indexes and is quite good for batch reloads.

MEMORY -- good for fast access to relatively small and simple tables.

InnoDB -- good for transaction processing. Also good for a mixed read / write workload.

NDB -- relatively less mature. Good for fault tolerance.

The mySQL server is not inherently multiprocessor software. So adding cores isn't necessarily going to jack up performance. A good host for mySQL is a decent two-core system with plenty of RAM and the fastest disk IO channels and disks you can afford. Do NOT put your mySQL data files on a networked or shared file system, unless you don't care about query performance.

If you're running on Linux issue these two commands (on the machine running the mySQL server) to see whether you're burning all your cpu, or burning all your disk IO:

sar -u 1 10
sar -d 1 10

Your application sounds like a candidate for myISAM. It sounds like you have plenty of hardware. In that case you can build a master server and an automatically replicated slave server But you may be fine with just one server. This will be easier to maintain.

Ollie Jones
Thanks for the information. I used sar as well as vmstat, top, iostat etc. to monitor the load. Most of the time CPU usage is below 20%, and there's not much iowait for a 40-second single select. While for innodb, I was able to send numerous requests to get consistent 90% - 95% CPU load for an extended period of time. Maybe I should rollback to InnoDB for now ...
QWJ QWJ
Hmmm. Maybe your cluster is network saturated.
Ollie Jones
All data nodes, MGM node, SQL node are on the same box. How to verify the network load? Thanks!
QWJ QWJ
And I suppose all the data crunching for these queries should be limited to data nodes - not much network transmission here
QWJ QWJ
!!! If there's a point to clustering, it's using multiple machines to run the data base, to increase performance. If you run a lot of cluster notes on one box, they have to communicate and synchronize with every data update. Try looking at /sbin/ifconfig lo0 to get counts of localhost ip traffic. Better yet, use a simpler data base server setup, like myISAM or InnoDB.
Ollie Jones
Yes this is just an evaluation of NDB before actually deploying multi-server prod on it. I thought my box had enough resources to run 2 data nodes, and at the very least NDB's performance should be as good as InnoDB. Also, isn't lo0 supposed to be faster than ethernet? If lo0 is a bottleneck here, why would 1Gb switch/ethernet perform better in a multi-server environment?
QWJ QWJ