views:

380

answers:

3

Hi all,

I am a fairly new MySQL developer and am starting on a project that I could do with a bit of initial advice on...

I am creating a database that will primarily be holding a certain number of items (between 1-5k) and around 40 boolean variables associated with each one. Users will then be inputting their choice of these 40 values and it is the job of the system to determine the 'best' matched items. This may be items that match all 40 variables or, if none exist, the ones that match 39 etc.

So, a couple of queries if anyone has the time!

  1. From my experience of MySQL there is no significant speed advantage in splitting up data into separate tables for a database of this size. The overheads for more tables are simply too large to make any viable difference to the overall performance. Therefore, I would be proposing to simply create one large table with 40 columns and up to 5000 rows to store all of the information (table locking is not an issue as all queries will be SELECT). Does this match with others' thinking and experience?
  2. What would be the most efficient way of returning the 'best' match? Is this even possible through database structure and SQL commands alone or am I going to have to simply return the entire array to PHP and run a form of heuristic function on that there to determine the best matches?

Thanks for your time & help!

A: 

I'd be using two tables. One for the items and one for the boolean flags that match an item. Only make an entry in the 'flags' table for matches for an item. Then to get the number of matches for an item, would simply be a count of the records in the 'flags' table that match the itemId from the 'items' table.

PaulBM
if you go this route, you can also count "negative matches" with an outer join and an `IS NULL` in the `WHERE` clauses; however, I still think it's a somewhat roundabout approach compared to the BIT_COUNT one I suggested.
Alex Martelli
+3  A: 

A single table is surely right. You can store up to 64 boolean variables into a single BIGINT column, as a "mask" with one bool per bit, and compute the match extremely fast as BIT_COUNT(~(the_column ^ user_preferences)) which will count how many bits are equal between the column and the mask giving the user's preferences (should PHP give you problems manipulating 64-bit integers, you can use two columnns of 32 bits each, summing the two bit counts will still be very fast).

Alex Martelli
A: 

I don't think that is the best method to store this kind of information. It may look good visually but if all your storing is boolean values then i would create two tables and one link table with entries for each matching true value.

There is no overhead here as mysql prefers to search rows instead of columns. The count() function will come in handy then.

I'm pretty sure if it fails to find any match you will have to revert to PHP to run the search to find a match for 39 and so on. A recursive function would be a good way to do this.

e.g.

Table xOption id, name

table yOption id, name

table xOption_yOption xOption_id, yOption_id

The other good thing about this is you can easily add more X or Y options later to your grid and you could store more details about the Options too.

don't forget to use indexs too.

Derek Organ