tags:

views:

90

answers:

6

I am using MySql (and PHP).

It's best if I give an example:
I am making a query, say SELECT * FROM a_table WHERE type='visible' then returning the count. Then later I query again but WHERE type='hidden', etc. I may query again to get the full contents. All queries are unique, but are from the same table. I also query multiple other tables with unique queries. (PLUS execution time of mysql_real_escape_string() sprinkled in the queries)

I'm not sure how many queries I perform altogether. But I am worried that if I keep increasing queries, I will start getting into large execution times.

My question is: are large amounts of queries noticeable on performance? If yes, is there a better solution on how to store that data? Is performing an initial query and storing all the rows in an array (large array could be 1000's of rows) and then manipulating the array faster (and/or feasible)? How many queries does this gray area become black and white?


Results from 'optimizing my queries'.

These are execution times in seconds:

  • No optimization: .358 s
  • Return COUNT(*): .321 s
  • Reducing my column selection: .266 s

Although, the biggest was not opening a connection during every query. I was idiotically opening and closing a db connection each poll. By changing it to one continuous connection, I got execution time down to .085 s!

Lesson learned. Thanks for all the input!

+4  A: 

If you would do this all in one query, PHP would have to do the rest. So, the question is where the bottleneck is. Is the db server machine slower than the webserver machine? What about the network conneciton between them? Best way to optimize it for your conditions is, as I guess, experimenting.

joni
Generally speaking, which guy is slower? Is it more time to parse through queries or seek though arrays?
Jason
+2  A: 

The fewer the queries, the better. Of those, only get the columns you actually need.

OMG Ponies
Do you happen to know the difference between the sql server regurgitating one column as opposed to all of them? Seems like it would be all the same to it. It either has to find the column (in the row) your asking for, or spit out all the columns.
Jason
@Jason: Each data type has a byte value associated with it -- the more columns, the more bytes over the wire. Worse, binary data (BLOB, CLOB) will *really* inflate that...
OMG Ponies
@OMG - true true, I didn't consider it actually transmitting that data (duh!), just seek-and-find. I only have one BLOB, in my tables, but most of my columns are 100 bytes.
Jason
@Jason: Yea, less of a concern if the database is on the same box as the rest of the application, which isn't common on medium to large scale applications.
OMG Ponies
@OMG - on large scale, are they communicating over TCP/IP. I don't know much about the IT part of it?
Jason
@Jason: Yes, pretty much all goes over TCP/IP. Some databases support others and have their own custom ones, so it depends on setup.
OMG Ponies
@OMG - Now the question is, if I do a query for multiple rows, does it send one row per packet (assuming it's less than 2k), or does it fit as many rows as it can in a packet? I know this is much higher than optimization, just curious.
Jason
@Jason there could be a difference between selecting 1 column vs. all columns depending on whether the table(s) queried have a covering index. A covering index includes additional column in addition to the indexed columns. I'm not a MySQL user, but I googled for "MySQL covering index" and got some results. I looked at the first couple and they looked useful, but I'll let you google and pick what looks good since I'm no MySQL expert. Also the size of each row might make a difference -- and potential future size too since someone might come along and and some columns with large datatype.
Adam Porad
@Jason: Depends on the lowest Maximum Transmittable Unit (MTU) setting on the interfaces between the database and the destination, but there's still overhead (and additional overhead if piping through a VPN).
OMG Ponies
@Adam Porad: An index, covering or otherwise, can help get the data out of the database faster but has nothing to do with speed over the wire. Having an index doesn't ensure it will be used -- MySQL only uses one index per statement it sees via the EXPLAIN plan.
OMG Ponies
@OMG Ponies - in @Jason's comment he asked "regurgitating one column as opposed to all of them?" The comment thread started discussing network performance of selecting all columns without including other possible performance considerations, which is why I mentioned the index.
Adam Porad
A: 

Definitely, large queries and multiple queries are performance hit as DB query involves connect, query and disconnect.

For performance improvement, please make sure that you have right flags in place for mysql,conf and there are variables wrt cache and they will prevent you in hitting database if you are referring to same table again and again.

--Cheers

Koteswara sarma
db usually only connects once per script
knittl
@knittl - haha, actually I was called out in an early answer because I was actually opening and closing a connection in a query. I was trying to make it easier for developers later on and I was afraid a developer would open a connection with the same name. My excuse is that I don't have a formal teacher. I am a hobbyist and am teaching/taught myself!
Jason
+3  A: 

Don't pre-optimize. Obviously the fewer queries, the less overhead in running them, but this overhead is usually very small. The number of queries is not anywhere near as big a factor as properly indexing the tables and optimizing the actual queries themselves.

Generally, each query has to send some data over the network to the listening port on the database process, which parses, compiles and runs the query. Usually, query plan caching will ensure that minimal amount of time is spent doing this. As long as your network isn't bottlenecking anything, the RPC overhead is quite low, on the order of a few milliseconds or less.

Really, it will be dependent on the queries themselves and the situation you're in.

womp
+2  A: 

if you want to count your distinct types use the following query:

  SELECT `type`, COUNT(*) AS `count`
    FROM `a_table`
GROUP BY `type`
knittl
+2  A: 

I agree with other users. The less queries, the better.

On the other hand, consider this. Now days, average CPU speed is more than 1GHz, thats about 1,000,000 clock cycles per second, and in each clock cycles you can get multiple things done. Example would be that ALU (Arithmetic Logic Unit) can perform more than 1 additions per clock cycle.

Taking this into account, having more than 1 queries is not such a big deal. It would take a couple of more milliseconds (at most) to process extra one or two or three queries. Basically the user will probably not even notice this.

Warning though, if you will have like 100 extra queries, yes, this will probably cause some noticeable delays. So the idea is to do things in moderate amounts.

miki725
100 probably isn't even enough. I recently consolidated 2000 queries into one query (on SQL Server 2008) and the RPC overhead savings was less than 500 ms. The savings in reducing the extraneous logical reads and getting all the data in bulk was very large though.
womp
It be much more impressed if you knew how many clock cycles an average query could take! ;) In the grand scheme of things, a milli-second is pretty slow. Considering a clock cycle is on the order of nano-seconds (given your aforementioned 1 GHz). In my early days of engineering when it came to u-processor programming, 100s of milliseconds were a significant speed hit that I could not afford unless it was important (with a clock around 8 MHz). Proportionality says the same (to me).
Jason