views:

300

answers:

3

Firstly, I am not certain this is the best way to handle this AT ALL ... totally open to alternative solutions.

Secondly, I feel like I'm missing the obvious ... but I'm still missing it so not to proud to ask!

UPDATE: .NET 3.5 environment w/ SQL 2005, so things like dynamic linq possible, although I always tend to think of dynamic (build on fly) queries of any kind as kind of clunky. PITA to maintain.

UPDATE 2: In response to northpole, pseudo code / written word logic / sql / linq / C# all acceptable (!) ... more of a notional "what is a good approach" than I need code kind of question.

Given a table of "shoes" that looks like this:

  ShoeID   PropertyName    PropertyValue
  1        COLOR           RED   
  2        COLOR           RED   
  2        SIZE            11
  3        COLOR           RED   
  3        SIZE            11   
  3        MANUFACTURER    GUCCI

I need a way(s) to query for shoes such that

COLOR=RED returns

  1
  2
  3

COLOR=RED and SIZE=11 returns

  2
  3

COLOR=RED and SIZE=11 and MANUFACTURER=GUCCI returns

  3

At design time, I do not know how many different properties that there might be, nor do I know how many query parameters there might be ...

Hopefully this makes sense ... if not, please comment accordingly and I'll try again.

A: 

Hi brad,

Dynamic sql will help you. Something like:

ALTER PROCEDURE [dbo].[sp_GetFilteredRecords]
(
...
@ConditionList nvarchar(MAX) = null
)
AS
BEGIN
  DECLARE @sql        nvarchar(MAX), 
          @paramlist  nvarchar(MAX)

  SELECT @sql = 'select ... '
  ...
  SELECT @sql = @sql + ' where 1=1 '
  ...
  SELECT @sql = @sql + ' ' + @ConditionList + ' '


  SELECT @paramlist = ...

  EXEC sp_executesql @sql, @paramlist

Sure it may slight differ due to tools.

Boris.

Boris
concatenating together dynamic SQL is usually a bad idea - it's error prone, prone to SQL injection, and cumbersome to use. Since it's very dynamic, it will also usually fail to make use of any indices and thus be slow. I would recommend against using it whenever possible
marc_s
so what you can suggest in this situation? another solution?
Boris
A: 

If I understand you correctly, you want to be able to create a query that could have criteria based on any number of fields, which are unknown upfront. That's tough....

While most RDBMS do offer some kind of "dynamic SQL" capability, those approachs tend to be either cumbersome to use, or slow, or both.

Of course, you could also concatenate together a SQL statement in your client code, be it C#, Java, PHP or whatever - but that again will tend to be cumbersome, prone to SQL injection attacks, and in general rather clumsy.

Also, if you have queries that are different from one request to another, the RDBMS won't be able to cache any of your query plans, and getting indexing right to get a decent query performance will be challenging at best, impossible at worst.

So while I totally understand the requirement (and have to fight it off myself almost daily), it's really something that doesn't work well. I would rather try to identify the most frequently used searches and their criteria and make sure those work fine and are fast. Limit the user's flexibility in order to achieve a decent query performance. A classic trade-off - you can be totally flexible but then you have horrible performance, or you can make your frequent queries run fast - but loose some flexibility.

One area where you could possibly gain something is when searching free text in text fields - RDBMS like SQL Server support the concept of a fulltext search, which gives you some flexibility and good performance. Check into that if you have lots of text fields.

Marc

marc_s
Hi Marc - Hear you loud and clear, but not doing is not an option. Only good news, if there is any, is that sql injection attack is not a concern! private, no UI kinda thing.
brad
Well, Brad - you've been warned! :-)
marc_s
Ha, yes sir .. I have! Working on some alternatives as well. Actually didn't completely understand requirements and it may, in fact, no longer be a "database" thing. May wind up loading relatively small (10-100) number of EAV rows into a dictionary and then wind up manipulating intersections of dictionaries. Yuck.
brad
+1  A: 

So whether or not this is a best approach depends on many things. For example, do you need to support different classes of entities (e.g. shoes vs dresses) that may have different (incompatible) attributes? Or what's the estimated number of entities you'll have (something that will work reasonably well for 10K will not work for 100M)? Or how often will you have to handle such queries and how well do they need to perform?

Two most common schools of thought on this are EAV model, which is more or less what you have and column-based approach where your entity's properties (color, size, etc...) are each mapped to a separate column. Each has its advantages and disadvantages, the biggest of them being flexibility / performance of the former and the necessity to dynamically alter table structure for the latter.

If you do go with your existing model, I would recommend to move your property names into a separate table and change 'shoes' table to have FK to that table. You can then create an index on (property_id, shoe_id) and generate your queries as follows:

SELECT shoe_id FROM shoes S_1 [, shoes S_2, ..., shoes S_X]
 WHERE S_1.property_id = 3 /* FK for 'color' */
  /* the following 3 lines will be repeated for each 'property' you need to query on */
   AND S_X.property_id = 4 /* FK for 'size' */
   AND S_X.shoe_id = S_1.shoe_id
   AND S_X.property_value = 'RED'

which should perform reasonably well provided you have a more or less uniform distribution of attributes and not a huge number of shoes.

ChssPly76