tags:

views:

60

answers:

3

I'm having an issue bending my head around this one. I have a table with the following structure. Contains about 5 million rows.

Id bigint primary identity, auto increment
SKU int
Keyword nvarchar(200)
KeywordType nvarchar(1)

The table is broken down into all possible keywords, in multiple languages for a given SKU. Thus for example, a Lord of the Rings product may have 100 records due to the different acceptable keywords but all the same SKU. Ignore KeywordType for now.

Issue #1: How can I write a SQL query to return records based on an input such as "Lord Rings" ?

Issue #2: The KeywordType field is a weird one. Its to be used to filter records based on the format, eg CD, DVD, etc. Thus a KeywordType value of "X" for a given result set of SKU's is to be further filtered by its value. Example, user is searching "Lord Rings" with a DVD filter. I need results as from issue #1 and also only those with a Keyword of "DVD" AND KeywordType "X".

Finally, I'm looking for an ANDed solution. Thanks. Hope someone can help...

Here is some sample data for a particular SKU for Lord of the Rings The Two Towers

650446 12288 DVD F 
650452 12288 LORD T 
650453 12288 LTD X 
650454 12288 MOVIE A 
650455 12288 OF T 
650457 12288 RINGS T 
650460 12288 THE T 
650461 12288 TOURS X 
650462 12288 TOWERS T 
650463 12288 TWO T

If the user inputs "Lord Rings" then I would expect to get the above SKU returned in the search results.

A: 

If the KeywordType is of limited number, as in you know all the possible keyword types, it would be better as an ENUM than an NVARCHAR. If you don't then I still recommend making it an NCHAR(1) instead. The VARCHAR types take extra space to store the length and thus a VARCHAR(1) is actually bigger than an CHAR(1).

As for the search, try something like:

SELECT id, SKU FROM SKUKeywords
WHERE Keyword IN ( 'lord', 'rings' )

Make sure to case all the keywords the same in the database, and do the same to any input. For the second just add an AND conditional to the WHERE clause like this:

SELECT id, SKU FROM SKUKeywords
WHERE Keyword IN ( 'lord', 'rings' )
  AND KeywordType = 'DVD'
Oz
A: 

Make sure you have an index on Keyword and KeywordType!

Option 1: Dynamically build the query with a loop. Didn't specify language so pseudo code....

foreach $var ( 'search','terms','here' ) {
  $query .= "Keyword = $var AND";
}
chop last 4 characters.
SELECT SKU,COUNT(Id) AS score FROM blah 
WHERE ( $query ) AND KeywordType = ? 
GROUP BY SKU ORDER BY score DESC

Option 2: Use IN. (I've usually found this slower)

SELECT SKU,COUNT(Id) AS score FROM blah 
WHERE Keyword IN ('search','terms','here' ) AND KeywordType = ? 
GROUP BY SKU ORDER BY score DESC

By 'AND'ing I'm assuming you mean grouping your matches by SKU.

The GROUP BY will give you that. This will give you the matching records, with the ones that matched the most keywords first.

Exact keyword matches only. If you want non exact, you are back to using LIKE. LIKE on 5 million rows is not really an option.

You also need to normalize the database keywords as all upper or lower case and the converting all user input keywords to the same.

Obviously search terms need to be sanitized, but that's language/database specific.

Daren Schwenke
let's leave out KeywordType for now. Here is some sample data for a particular SKU for Lord of the Rings The Two Towers650446 12288 DVD F650452 12288 LORD T650453 12288 LTD X650454 12288 MOVIE A650455 12288 OF T650457 12288 RINGS T650460 12288 THE T650461 12288 TOURS X650462 12288 TOWERS T650463 12288 TWO TIf the user inputs "Lord Rings" then I would expect to get the above SKU returned in the search results.Simply using LIKE isn't helping, as its missing the crucial ANDing
Strath Clyde
Drat, sorry for the formatting
Strath Clyde
A: 

The question is a bit confusing but I think you need:

1) A way to parse the user input (eg "Lord Ring") into individual keywords (eg ('Lord', 'Ring')). This would preferably be done at the level of the application, but can be done in SQL / PSQL / TSQL i.e. most any flavor of SQL.

2) A SQL query like this (derived from Daren Schwenke's solution)

SELECT SKU, COUNT(*) AS Ranking
FROM tblKeyWords T1     -- or whatever the name
WHERE Keyword IN ('Lord', 'Ring')  -- here the keywords
AND KeywordType = 'T'   -- Optionally be specific on type
AND SKU IN (            -- filter to only take items that are DVDs
     SELECT SKU
     FROM tblKeywords
     WHERE KeywordType = 'X' AND Keyword = 'DVD'
     )
GROUP BY SKU
ORDER BY COUNT(*) DESC

Note: the effectiveness of this structure for performing what is essentially a form of fulltext search leaves much to be desired. The situation can be helped by introducing the right indexes at a glance we may need at a minimal - (Keyword, SKU) - (KeywordType, Keyword, SKU)

A few other things could help, for example excluding several "noise words" such as "OF", "THE", "A", "TO" from the index (and of course from the search criteria supplied by the end-users)

But on the whole, it may be a good idea to assess the wisdom of proceeding with this structure; it may make sense with the specific application at hand, the OP is the only one to know this...

mjv
Thanks, this looks like it will work. I'll be translating this to LinqToSql but that won't be an issue.
Strath Clyde
ORDER BY COUNT(*) is more portable than my version.
Daren Schwenke