views:

87

answers:

4

I have this classifieds website, and I have about 7 tables in MySql where all data is stored. I have one main table, called "classifieds".

In the classifieds table, there is a column called classified_id. This is not the PK, or a key whatsoever. It is just a number which is used for me to JOIN table records together.

Ex:

 classifieds table:           fordon table:
       id => 33                   id => 12
classified_id => 10             classified_id => 10
  ad_id => 'bmw_m3_92923'           

This above is linked together by the classified_id column.

Now to the Q, I use this method to fetch all records WHERE the column ad_id matches any of the values inside an array, called in this case $ad_arr:

SELECT mt.*, fordon.*, boende.*, elektronik.*, business.*, hem_inredning.*, hobby.*
    FROM classified mt
    LEFT JOIN fordon ON fordon.classified_id = mt.classified_id
    LEFT JOIN boende ON boende.classified_id = mt.classified_id
    LEFT JOIN elektronik ON elektronik.classified_id = mt.classified_id
    LEFT JOIN business ON business.classified_id = mt.classified_id
    LEFT JOIN hem_inredning ON hem_inredning.classified_id = mt.classified_id
    LEFT JOIN hobby ON hobby.classified_id = mt.classified_id 
    WHERE mt.ad_id IN ('$ad_arr')";

Is this good or would this actually fetch unnecessary information?

Check out this Q I posted couple of days ago. In the comments HLGEM is commenting that it is wrong etc etc. What do you think?

http://stackoverflow.com/questions/2782275/another-rookie-question-how-to-implement-count-here

Thanks

A: 

This is a matter of opinion. Are you having performance or scaling issues? If not, then being specific about which columns to return is probably a matter of premature optimization. Duplication of integer join columns isn't going to break the bandwidth bank any time soon.

marr75
Okay... No I don't have performance issues. The website is not uploaded yet... Thanks for the answer
Camran
-1 Not remotely a matter of opinion or taste. By only selecting the columns you need you maximise the chance that a query can be satisfied from an index alone and reduce the overall query processing workload.
Martin Smith
I hope the site's a success, Camran. Once you you start having problems serving up requests, then you should look at typing out every column name. A web framework (Cake, Django, Rails, etc) could make this easier, too. Django for example let's you pick just the columns you need with "defer" and "only" methods then generates the sql for you.
marr75
I think we'll have to agree to disagree on that, Martin, optimized performance becomes a problem to look at once your app hits a bottleneck and is actively being used. Also, the first optimization of performance I would do if I found this query to be a bottleneck would be to cache the results or the resulting pages, outperforming any index or column optimization pretty easily.
marr75
So fire fighting when performance suddenly becomes a problem is better than a fairly easy way of doing it right in the first place?
Martin Smith
Depending on the requirements and constraints. My take on this particular problem was that a solo developer is about to deploy a web app in a domain that already has a lot of alternatives, it's important to understand that he could have written this query to perform better but there's a big chance it won't get used much anyway and there's an even bigger chance that this query will not be any kind of performance bottleneck that can't be better solved by caching. If he has a choice between launching today and launching tomorrow based on optimizing these queries, I'm an advocate for today.
marr75
when you get good at developing and go the the next level with your SQL, you will discover the covering index, where you include data (additional columns) in the index that is not part of the index. The database can then use these columns and return the results without going back to the table from the index find. If you always select * you can not use a covering index because you are asking for all the columns. When you are tuning your slow queries (which are fast now because the app is not live and has no data) you will not be able to use this technique.
KM
I don't think it's fair to make any assumptions about whether I'm "good" at developing or not based on my answer. My answer says, yes, this query can be optimized, but I feel he should profile first with a real load before optimizing anything.
marr75
Profiling with a real load will be made more difficult as he is not returning the columns he needs. The Database Tuning Advisor and missing index DMVs won't have anything to work with.
Martin Smith
@marr75 - I understand your point, but this is generally considered SQL 101, not advanced optimization.
Matt
this question about `select tableA.*, tableB.*, ...` is right up there with questions about `why can't I just make all my columns varchar(5000)` some people will call it an opinion on how to do it other will call it ridiculous. Try to get either of those past a code review. I'd laugh at anyone trying to put that code in production.
KM
+4  A: 

You are surely returning unnecessary results, to answer your question.

It is a bad habit to get into.

Kenneth
So how should it be written then? SELECT table_name.column_name? What if there are alot of columns, it would be a loooong query then right?
Camran
@Camran, you pick: one long SELECT statement, or an even longer result set multiplied by the number of rows.
KM
You can use abbreviated table names for aliases, which would shorten the query a bit. Naturally, the query will be longer by naming the columns, but as programmers, we shan't be lazy.
Marcus Adams
And I don't know about mySQl but in SQL Server I can drag and drop the columns into my query, so listing them doesn't take much time at all.
HLGEM
+6  A: 

Strongly disagree with marr75. First becasue if you do this poor techinie in most of your queries, you are adding unnecessary load to virtually every query. Database queries need to be written as well optimized s possible as it is is exceedingly painful to later go bnack and rewrite every query in your datbase becasue you used a known poor techinique. Refactoring in datbases is hard and performance must be considered in design, it is not premature optimaization to use techiniqes that are known to improve performance from the start, it is good design.

Next, you have the maintenance issue. If you are depending onthese columns being in a particular order and someone changes the structure of the databe you are out of liuck. Alos, if someone adds a column that you don't want to show the user (Which is is common) you are out of luck. THis is a very bad techinique and select * should almost never be used ina production system. If someone adds a column, it will be returned inthe query but you need to know what was added and why in order to make the interface do what it needs to do anyway, so you have no maintenance savings by using this poor technique.

HLGEM
+1, waste a fraction of a second, oh well. waste a fraction of a second a million times every day, oh *%$@! don't be lazy, only return the columns you need!
KM
+2  A: 

Ad Hoc Queries

These are queries that you write to run one time, or on rare occasions.

How large of a result data set must you return that it would take longer to do a SELECT * than type out the column names?

How likely are you to forget a column, add it, and have to run it again?

Your time is more expensive than CPU time. If you're running it once, let the database do the work. SELECT * is fine for ad hoc queries if it will save you time.

There are exceptions, such as Blob fields on large data sets, but you get the point.

Production Queries

These are queries that are stored in your application or database. These queries are run often.

How many times do you have to run a query to make up for the time it would take to name your columns? It adds up fast.

Name your columns in production queries to allow your application to scale better and perform at maximum efficiency. There are other minor advantages, but they're not as exciting.

Summary

  • Add Hoc Queries : SELECT * generally okay.
  • Production Queries: SELECT * always bad.
  • It's okay to be a little lazy, but be smart about it.
Marcus Adams