views:

45

answers:

1

I need to store reviews from different sources in table. Fields:

  • 'produtcId' char(14)
  • 'user' varchar(128)
  • 'Source' varchar(128)
  • 'content' text`

Use cases:

  1. Find all reviews for product
  2. Insert or update review

I have troubles with case 2, because I need to find if review already exist (review with same produtcId,user and Source).

Question: Is it good to create primary key or Unique index by produtcId + user + Source?

+3  A: 

this is a case where natural keys become bad, varchar(128) is just to big for a PK in my book. it forces you to have a big fat (very wide) PK or index in the review table. I'd do it this way:

Products
ProductID     int autoincrement PK
ProductNumber char(14)
ProductName...
...

Users
UserID        int autoincrement PK
UserName      varchar(128)
...

ProductSources
ProductSourcID  int autoincrement PK
ProductSource   varchar(128)
...

Reviews
ReviewID      int autoincrement PK
ProductID     int FK
UserID        int FK
ProductSourcID  int FK
ReviewContent text
....

if you really only want 1 review per product+user+source, then you could make the unique index on ProductID+UserID+ProductSourcID.

You could consider making the PK: ProductID+UserID+ProductSourcID. However, if you need to FK to Reviews in another table, then you need to drag around ProductID+UserID+ProductSourcID. I prefer to FK to ReviewID.

In any case the int+int+int auto increment ProductID+UserID+ProductSourcID is way better than the char(14)+varchar(128)+varchar(128) version, both in terms of disk storage and cache memory usage. It is much easier for the database to use and store the fixed width int+int+int index values than the char(14)+varchar(128)+varchar(128) version as well.

Also, by using the auto increment PKs, the user can change their UserName (marriage/divorce) and not break all the FKs. It will force all of your ProductSource values to be standerdized and not free text, impossible to join to.

EDIT based on OP's comment:

This will dramatically complicate insertions and i don't need any additional info for users or source. What about using hash of this fields as primary key?

I'm not sure how the IDs complicate insertions. however, if you are unable/unwilling to change the PKs of the other tables, then a hash is the best way to go, but I would not make it the PK. Never make a hash a PK, there can be collisions, preventing insertion of legitimate data. Use an auto generate INT as the PK and add a hash column. You should do it this way. Create a new column in Reviews, called "ReviewHash" and add an index to it, you could include the productid, user, and source columns as "covered columns" if you expect many collisions (multiple different rows that have the same hash value). Also, do the WHERE do it like:

FROM Reviews
 ....
WHERE 
    YourHashFunction(CONCAT(given_productid,'||',given_user,'||',given_source))=Review.Hash
    AND Review.productid=given_productid 
    AND Review.user=given_user 
    AND Review.source=given_source`  

this will allow for an index to be used on the Review.Hash column and by also checking the productid, user and source, it will eliminate any invalid data if there was a hash collision.

if you do your query like:

WHERE 
    YourHashFunction(CONCAT(given_productid,'||',given_user,'||',given_source))
        =YourHashFunction(CONCAT(Review.productid,'||',Review.user,'||',Review.source))

then an index can't be used, and the query must apply the YourHashFunction to every row in the table. Also, if you leave off the checks for productid, user, and source, you will get results where the hashs work out the same but the actual values differ.

KM
+1: My thoughts exactly - foreign key relationships.
OMG Ponies
This will dramatically complicate insertions and i don't need any additional info for users or source. What about using hash of this fields as primary key?
Orsol
Does MySQL have index compression? You could make the key as long as you liked, in that case. And I'm not convinced it's worth creating three tables to avoid a fat index, even if it doesn't.
Brian Hooper
@Brian Hooper said `And I'm not convinced it's worth creating three tables to avoid a fat index`. I can't imagine a system that didn't have a `user` table. If there isn't one, then how can you control the system? what good is the rule about 1 review per product+user+source? I'll just change my name every time I use the system. The same with products, if there isn't some table controlling products then how do you keep people from entering reviews on the same product but with slightly different names? The same for sources, when I use this system can I enter any source I want (bear can label?)
KM
KM, the reasons you have given make it into a good idea.
Brian Hooper
@KM, as I mentioned before, I collect reviews from other sources (systems), so user i is just author name, but not users of my system. "can I enter any source I want" - yes, but you there will be not many values, so I can create separate table for this.
Orsol