views:

573

answers:

5

Hello stackoverflowers,

I have a straightforward question.

I am doing a web application that uses MySQL, and I am currently designing it. I just have a small question about performance.

I would like to know what is more efficient:

Scenario #1:

Table: Restaurant
    -Name
    -City
    -Province
    -Country
    -Continent

sql =~ select * from restaurant where id = something.

or

Scenario #2:

Table: Restaurant   
    -Name
    -City
Table: City
    -Name
    -Province
Table: Province
    -Name
    -Country
Table: Country
    -Name
    -Continent
Table: Continent
   -Name

sql =~ [insert multiple sql queries that will output the name and the city,
        with the corresponding province, country, and continent]

Logically, I think Scenario #1 is better (less query), but some people swore to me otherwise.

Thank you.

A: 

The second option is a normalised structure, which means your data is less redundant, less chance for making errors, etc. I always vote for normalising data unless you're going to run into performance problems.

Incidentally, SELECT * FROM [Table] isn't good practice anyway. You'll want to put in the column names.

Randolph Potter
If making errors and etc is not an issue, and if this is purely performance wise, would scenario #1 be better? As for Select *, thank you, while I was waiting for responses (quite fast, actually!), I just read that select * is very bad.
Aktee
+2  A: 

True, but the question is what option performs better. In that case, there is no doubt about it: option # 1 will perform better due to the query not having to JOIN with any other tables. Randolph does have a good point, whenever possible you should normalize your database structure.

ErikSchierboom
Thank you for your quick response. I will read articles about normalize database structure as I do not understand the concept very much as of now.
Aktee
+1. Option 1 will definitely be faster. (But that doesn't mean it's good). It would sure help you, Aktee, to know the pros/cons of normalization (I've learned this the painful way :D)
mives
Thank guys, I will use the normalized database. From what I read, I don't think the performance gain is worth the frustration.
Aktee
A: 

If you use the first scenario you get the problem of increased space use (for all the duplicate province, country, continent) and if you need to change the name of a city/country you need to change it in all rows where it's used.

For convenience I would use the second scenario. I don't think there will be big performance differences between the two scenarios (in the first scenario you only touch one table, but read back more data from disk, in the second scenario you read less data from the disk, but from multiple tables). It really depends on what kind of data you have there.

Edit: To explain my point above: if you keep all data in a large table then you need to actually read all the rows from the disk, even if much of the data read is the same (namely the city, province, country, continent). Even if the SQL caches data as it can it won't help here since it can't know that data from other rows is the same.

If you normalize the database and read from the restaurant table you will get ID's for the cities. Now if you have the same ID on multiple rows the SQL server will cache the data read for the city and won't hit the disk again, so it will be an increase in speed. This will be offset by the need to access a new table, but with correct indexing on the city ID that shouldn't be too much.

That's why I'm saying that with large databases the performance difference is not easy to assess and you'll be better off having a properly normalized DB.

And yes, if you use a normalized DB (second scenario) you can change the city name in one place since there will be a single row for a city. The same will work for the others (province, country, continent).

rslite
If I link them by ID, I can just change the name, no? Can you explain or give me a link about "Read back more data from disk" and "read less from disk but from multiple tables". It is quite confusing for me to understand. The kind of data is text. Although I have to admit it is a big database.
Aktee
+2  A: 

If you are not experienced with database design, I'd suggest to always go with the normalized version. It's the right thing to do in most cases. You might want to denormalize your database in some cases, but then you should know exactly why are you doing that.

Note that in the second case it's not multiple queries. It's just one query, where all the tables are joined together. For example:

SELECT *
FROM restaurant
    JOIN city ON city.id=restaurant.city
    JOIN province ON province.id=city.province
    ...

Yes, it takes longer to write, but it's better than having inconsistent data in the database (maintaining a denormalized database is way harder). You can also use an ORM to do this kind of stuff for you.

Lukáš Lalinský
Thank you. I think I will go with scenario #2 then (If I understand the concept of normalized database design).
Aktee
A: 

Thank you guys for your input. "Normalized Database Design" was the key here. I googled it, speed-read it, and although it has a little bit less performance, the pros are really worth it.

Thanks again. (That was really fast btw!) http://en.wikipedia.org/wiki/Database%5Fnormalization

Wikipedia states that denormalized has a better performance, but I think I am just getting cocky and thinking I can handle a big denormalized database.

I'll stick with the less risky scenario. If shits hits the fan, I'll change hardware =).

Thanks again guys.

Aktee