I currently have a database with two tables called Articles and Tags . In order to allow articles to be in multiple categories i have a many to many relationship. Is it a mistake to have such a design in terms of performance? or should i remove the relationship between these two table and add a third table as a bridge (articlesTags)?
There's no problem with having a many-to-many relationship if that's what the data require, but you'll want a 3rd table to represent it.
From a pure speed standpoint, you may benefit from a bit of denormalization. You keep your Articles table. In addition you add a table ArticleTags. This table with have ArticleID and the text tags (not ids). This is going to cut a join for you. Separately you may want to have a table of the allowed tags.
so your ArticleTags table would look like
1 ASP.NET
1 SQL
1 HTML
2 ASP.NET
2 C#
etc...
and when you want tags, you just SELECT tag WHERE articleid = @articleid
it's not normalized, but it may be faster. it really depends on the volumn we're talking about and the usage expectation.
There's nothing inherently wrong with having a many-to-many relationship, you'll just need to create a Junction Table (which is what it sounds like you're referring to with articlesTags
) to facilitate that relationship.
A many-to-many relationship exists in a relationnal model, it's just an abstraction of the mind. When you'll implement it there will be a articles_to_tags table where you'll have :
fk_article(INTEGER) fk_tag(INTEGER)
cf http://en.wikipedia.org/wiki/Many-to-many%5F%28data%5Fmodel)
There is no problem in using Many to Many relationships. It is often required.
And yes, it is not possible to create a many to many relation, without using a third table.
You're seeing the difference between a conceptual database design (the N:N relationship) and it's physical realisation. No matter how you model your N:N relationship, you'll need the aforementioned Junction Table to make it work.
There's nothing wrong with modeling a real world relationship as close to the real world as you can as a general statement. Clarity is king.
When it comes to any performance question in any system the answer usually boils down to "it depends".
If your performance problem is related to WRITES then a highly NORMALISED structure is best and you'll want that Junction table. You'll end up writing a lot less data and that can speed things up substantially (though you may burn that advantage by having to do lookups before you create the inserts). Reading from the individual normalised tables can be very fast too.
If your problem is related to analytical READS then a DENORMALISED structure is best. Joins can be very performance intensive if the tables are large and the indices spread out. You'll sacrifice a lot of space to gain a lot of time.
In general, you want to look at the specifics of your situation and weigh the pros and cons of each approach before deciding on a solution. Personally, I've always found it better to focus on Clarity in the initial stages and refactor for performance if I discover a problem later.