views:

323

answers:

8

Good day,

In SQL Server 2005, I have a table numerous columns, including a few boolean (bit) columns. For example, table 'Person' has columns ID and columns HasItem1, HasItem2, HasItem3, HasItem4. This table is kinda large, so I would like to create indexes to get faster search results.

I know that is not I good idea to create an index on a bit column, so I thought about using a index with all of the bit columms. However, the thing is, all of these bit columns may or may not be in the query. Since the order of the indexed columns are important in an index, and that I don't know which ones will be used in the query, how should I handle this?

BTW, there is already clustered index that I can't remove.

+2  A: 

I would suggest that this is probably not a good idea. Trying to index fields with very low cardinality will generally not make queries faster and you have the overhead of maintaining the index as well.

If you generally search for one of your bit fields with another field then a composite index on the two fields would probably benefit you.

If you were to create a composite index on the bit fields then this would help but only if the composite fields at the beginning of the index were provided. If you do not include the 1st value within the composite index then the index will probably not be used at all.

If, as an example bita was used in 90% of your queries and bitd in 70% and bits b and c in 20% then a composite index on (bita, bitd, bitb, bitc) would probably yield some benefit but for at least 10% of your queries and possibly even 40% the index would most likely not be used.

The best advice is probably to try it with the same data volumes and data cardinality and see what the Execution plan says.

Steve Weet
+2  A: 

I don't know a lot of specifics on sql server, but in general indexing a column that has non-unique data is not very effective. In some RDBMS systems, the optimizer will ignore indexes that are less than a certain percent unique anyway, so the index may as well not even exist.

Using a composite, or multi-column index can help, but only in particular cases where the filter constraints are in the same order that the index was built in. If you index includes 'field1, field2' and you are searching for 'field2, field1' or some other combination, the index may not be used. You could add an index for each of the particular search cases that you want to optimize, that is really all I can think of that you could do. And in the case that your data is not very unique, even after considering all of the bit fields, the index may be ignored anyway.

For example, if you have 3 bit fields, you are only segmenting your data into 8 distinct groups. If you have a reasonable number of rows in the table, segmenting it by 8 isn't going to be very effective.

dnewcome
A: 

I don't know about 2005 but in SQL Server 2000 (From Books Online): "Columns of type bit cannot have indexes on them."

HLGEM
A: 

Steve Weet,

In my situation, there is no way to tell which bits will or will not be used in the queries, and how often.

Cedric Aube
If the distribution of the use of the bit fields is truly random then your index will probably only be considered every 8 queries. I expect this index is really not worth using. If perfomance is an issue you may need to refactor your table design and have your bit fields as rows
Steve Weet
A: 

How about using checksum?

Add a int field named mysum to your table and execute this

UPDATE checksumtest SET mysum = CHECKSUM(hasitem1,hasitem2,hasitem3,hasitem4)

Now you have a value that represents the combination of bits.

Do the same checksum calc in your search query and match on mysum.

This may speed things up.

Sam
Performing a checksum on the bit fields will have the same limitation that indexing them has. You will only have 16 unique values that are produced by checksumming the 4 bit fields, so the uniqueness will be low.
dnewcome
sure, but you are now only searching one field, right?
Sam
A: 

You should revisit the design of your database. Instead of having a table with fields HasItem1 to HasItem#, you should create a bridge entity, and a master Items table if you don't have one. The bridge entity (table), person_items, would have (a minimum of) two fields: person_id and item_id.

Designing the database this way doesn't lock you in to a database that only handles N number of items based on column definitions. You can add as many items as you want to a master Items table, and associate as many of them as you need with as many people as you need.

HardCode
A: 

HardCode,

Thank you for your input. Actually, we have other tables with person_id and item_id for each bit fields. However, since these tables are really large (millions of rows), I thought that creating another table with person_id and all the bit fields would be faster than searching in the person_id, item_id tables for each fields present in the query. I'm I right or off the track?

Cedric Aube
A: 

Odds are it will be easier for SQL to query the large table with the person_id and item_id and BitValue then it will be to search a single table with Item1, Item2, ... Item*N*.

mrdenny