tags:

views:

188

answers:

7

I have a classifieds website, with alot of categories.

When the user posts an 'ad' on the website, there are some basic info (name, city, price, headline, text etc...) that has to be filled in. And also, depending on what 'category' the user choses to insert the ad into, some more fields will have to be filled in, for example: if the category is 'cars' then 'mileage' 'year' comes up as well.

Now my search engine will query the table for whatever the user choses to search for...

My question is, how would you lay this database up to be most efficient and fast?

I have read about normalizing but dont really understand much of it... So whatever you explain, please explain it as basic as possible.

Thanks

+1  A: 

I'd start here to learn about normalization:

http://en.wikipedia.org/wiki/Database_normalization

or here

http://databases.about.com/od/specificproducts/a/normalization.htm

or for a very nice article for someone totally unfamiliar with the concepts, here:

http://www.phlonx.com/resources/nf3/

but to answer your question specifically, we would need to know what your current plan is for a design layout, and from there we could show you how to normalize it, and ask better questions to get to better answers.

David Stratton
A: 

A good book for starting learning this is Database Design for Mere Mortals by Michael James Hernandez.

TofuBeer
+1  A: 

Database normalization doesn't generally improve speed, it helps removing redundancy and improving consistency.

If speed is the goal, you'll actually need to de-normalize things. Complex joins are usually the performance bottlenecks in large databases, and reducing the number of joins by denormalizing the tables will improve performance.

jcelgin
A: 

There are many ways to go about it, here are a few:

  1. A table that has field name, value, type, min, max, etc... So each row is like miles, year, make, model, rooms, stories. Then you map categories to fields.

  2. A single table that has common info and then a column for meta data stored as xml, json, or some other serialized format. Using this technique, you'd probably need to use something like Lucene to index your meta data for searching.

Michael Valenty
A: 

Below are my points 1) Use indexes properly to spped up the select queries. 2) Aggregate navigators/query redirectors: This is the technology that automatically directs a query to aggregated data if such data are available and appropriate for the query 3) Partitioning: partitioning comes in many ways shapes and forms. At the very least it is dividing one table into several tables usually based on the time the table data represent 4) Parallelizing query execution – Sachin Chourasiya

Sachin Chourasiya
A: 

i don't think normalization is the problem here, you'll probably end up with a user- and a category-table and the table(s) to hold your ads (yes i know there will be probably some more tables for the app, but let us not complicate things).

The ads table is the interesting part: you have two and a half choice.

  • 1) one table to hold them all: one table that has all fields needed for all categories
    • pro: easy to setup
    • pro: easy to maintain (only one table to backup, to alter etc)
    • pro: very easy SQL which simplifies front end development
    • con: doesn't scale very well
    • con: wastes some space, which will slow down the database in the long term (depends on database and table structure though)
  • 2) one table per category
    • pro: scales better than (1)
    • con: very complicated SQL
    • con: nightmare to maintain: instead of one table you have to alter possibly 100s

so you see, option (2) is not really a option, even if it scales better. If you plan for a big system roughly the same effect can be achieved with clustering the table by category.

I owe you a half option: if you're not bound to mysql, postgresql could offer a interesting alternative: table inheritance. in pg you can define a table ads that holds your base structure and a derived table cars that contains all the fields of ads plus some (car specific) additional fields. And a table for electronics, and one for photographic equipment, and so on, you name it. You could even go further and create tables vans and convertibles that dont inherit from ads but from cars, creating a category tree that represents the object hierarchy of your frontend. So whats the difference to (2) you might ask? Maintenance is simplified, a structure change in the ads table propagates down to all the derived tables (while a change in the cars table would only modify cars, convertibles and vans as one would expect). Same goes for the sql: if you select * from ads where title='foo', the query will return records from ads and all the derived tables, the whole tree, if you select from cars only that subtree is searched .. you get the idea. There's more to it, your search engine could pull the structure/metadata of your categ tables and create the search interfaces from it, so your search interface is always in sync with the data structure, etc.

Haven't thought this to the end and i'm still not sure if i would build a system that way, but it has something. The system must be very well designed and will have lots of explicit boundaries but that can be a good thing(tm).


One final word about mysql and postgres. pg is a database that IMHO in most aspects is today superior to mysql, is just not that famous. And no, i'm not just a postgres fanboy, i'm a senior mysql user, i started using mysql with a version 2.something, i introduced it more than 10 years ago in the company i still work for (and made it the default database), and today i don't allow any new development i can decide to be based on mysql. The reason is simple: the default "storage engine" in mysql is myisam, which is fast and lean and offers lots of features .. and you will loose data in the long run if you use it. IMHO you can only use it for volatile data, and there are better alternatives for running a cache. if you have to rely on your data, myisam is a NOGO. I tested Innodb, the default for a transactional "storage engine", several times over the years, and i never found the performance satisfying, so i went for alternatives.

postgres is not perfect either, but it vastly improved in the last years in terms of usability, administration and speed (feature wise it was leading most of the time anyway, and still is).

ok, enough advocacy for today, i'll go to bed now :-)

pfote
A: 

I would put the basic info of every ad in the same table, and have a separate table for each category with the extended information. I think this gives the cleanest design. You will be able to use fitting database data types for each field, it will be easy to sort and filter, etc. This would give you, for example, the following tables:

ads

  • id
  • name
  • city
  • price
  • headline

cars

  • id
  • ad_id
  • mileage
  • year

The only downside is that your query will depend on the category.

Another option that is bound to come up is to save the fields and their values as key-value pairs in a 'properties' table. I strongly recommend not going that 'build-database-within-database' route. Sooner or later you will run into problems. The main pain point is that you will have to give all your values the same data type, which will often turn out to be VARCHAR. This means filtering and sorting of non-text values (i.e. numeric, date/time, etc) will become extremely cumbersome.

John