views:

185

answers:

5
Table1
------------
ID
IdColumn1
Idcolumn2

Table2
------------
ID
IdColumn
IdPair

Both of them contains the same data.

Table1 have both column populated, Table2 have those columns stored on two rows.

So, if Table1 contains n rows, Table2 will have 2 * n rows

Wich query is faster ?

select * from Table1 
where IdColumn1 = x or IdColumn2 = x

or

select * from Table2 where IdColumn = x

I already choose Table2 scheme and I have over 400.000 rows until now and over 1000 unique visitors per day. Every day is added over 2000 rows in this database. My website keep growing very fast.

Don't ask me why there are so many rows, they play games in online competitions and those rows are matches between players.

A: 

It is difficult to say. I think both should have similar performance or maybe second should be better since idColumn is a primary key. Check query execution plan and make sure I have proper indexes.

Jenea
A: 

The only cause for one table to be faster than the other is what indexes you create on the tables. There is no performance advantage having the second table, unless you don't make the correct indexes on the first table (or conversely).

For example, it might seem the second table is faster because you made an index on idcolumn1 on table 1 and idcolumn on table 2. If instead you had made an index on idcolumn1 and another index on idcolumn2 in table 1 then you would see very similar performance.

Since table 2 is a duplication of data it is inadvisable to maintain this table. Every update requires changing two rows.

However, I have see data designs for this type of data that look like this:

match table
-----------
matchid
additional match information

participants table
------------------
participantid
matchid

In this schema you have one row in the match table for each match (and any additonal data) and you have a table that looks like your table 2. It relates participants to matches.

Then you just need to do a select on participants and link it to the match data.

I believe this would be best practices for your situation.

Hogan
Those tables are just an example. I only have Table2 in my database and I think you're right that Table2 is a better storage scheme since it requires only 1 index to be created to gain the desired performance
pixel3cs
I thought about it for a bit @pixel3cs and came up with a good design... see my edits above.
Hogan
@Hogan: Please clarify: `instead you had made an index on idcolumn1 and idcolumn2 in table 1 then you would see very similar performance` It is unclear if you mean to suggest a multi-column index on Column1 _and_ column2 of table 1, or if you mean to have two indexes, one on each column of table 1. The combined index would certainly NOT help with the OR query indicated by the OP.
mjv
@mjv: of course the combined index would not help, I mean two indexes. I've edited my english to be clearer.
Hogan
+1  A: 

I'd choose Table2.

With the Table1 schema you need two indexes at least, one on IdColumn1 and one on IdColumn2 and you can query it efficiently using:

select * from Table1 where IdColumn1 = x
union all 
select * from Table1 where IdColumn2 = x;

But at least one of the indexes is non-clustered and you'll have a lot of logic juggling to identify all items related to a player, since they can be on either IdColumn1 or on IdColumn2. And just think at the mess a 3 way game will bring in the future (3 players, add IdColumn3...).

Table2 is better, as it has a clear purpose: stores all the games a player had participated in, clustered by the player Id. It can be interogated more simply, it can be structured more simply, and can be extended to more players per game later on.

Not sure what PairId is though. Your data model is a typical many-to-many relation, just replace 'Player' with 'Student' and 'Game' with 'Course' and you'll see that you ahve exactly the canonical Data Modeling 101 course structure of Students-Course (in your case it so happens that a game (=course) can have exactly 2 players (=students), but thet's a detail. You're still talking about a typical 3 table relationship (1 for games, 1 for players, one for player-to-game participation).

Remus Rusanu
IdPair is to identify the Opponent row. If I have a match with someone else then I should go to IdPair row to see with who
pixel3cs
+1  A: 

The table 2 implements the Entity-Attribute-Value model (EAV), which is often selected because of some advantages this model offers over the traditional table model (and relational model at large). One of the known advantages of EAV is that OR searches based on several columns values is both efficient and easier to code that in the traditional model.

Also several new features offered by newer SQL server implementations help with the EAV model.

This said, on the whole, the EAV model is more attractive for the flexibility it brings with regard to the logical schema, and other related advantages than for its performance, in particular when applied to databases with more than a million entities (i.e. possibly several dozen of millions EAV entries, if each entity has many atrributes).
Indeed, proving this point, several EAV implementations introduce a mix of both models, whereby the single-valued attributes which are common to most entities are stored in the "header file" rather than being in the EAV list.

Of course, the final word on which of the two models is more efficient [in the restricitve context of the OR-ed column value problem], depends on the effective implementation, the indexes, and the statistical profile of the data. For smaller EAV tables (like this one with c. 500,000 entries), the EAV model probably offers an edge, in the general case.

See this related SO article: database: EAV pros,cons and alternatives and in general scan the few SO articles with the eav tag.

mjv
Frankly, I don't see any reference to an EAV model in the original post.
Remus Rusanu
@Remus, you are right, the only reference is with the tag which I added myself. However, the table 2, with its ID, columnId, and [oddly named] IdPair appears to be a thinly disguised EAV implementation (if only an accidental one, if the OP didn't explicitly know about this model).
mjv
You still make a nice post on the whole EAV topic :)
Remus Rusanu
+2  A: 

I'd also go with Table2.

Just to highlight the difference in the approaches, here's the 3 execution plans generated for the options, assuming Table1 has nonclustered indexes on IdColumn1 and IdColumn2, and Table2 has a nonclustered index on IdColumn. ID is CLUSTERED. 100,000 records in Table1, 200,000 in Table2

1) Table1 approach with the OR condition on the 2 id columns:
alt text

2) Table1 approach with 2 statements combined with a UNION ALL:
alt text

3) Table2 approach:
alt text

Table2's plan is obviously a lot simpler.

AdaTheDev
The first plan of Table1 is actually very interesting, see how the OR is turned into a MergeJoin and a StreamAggregate? Very smart from the optimizer!
Remus Rusanu