tags:

views:

84

answers:

3

Hi all

I would like to perform an sql search and I would like to get best results. I tried some things but they didn't work well. I have got two columns named subject and content

For example we will search "search this keywords" text on subject and content area. First I'm searching "search this keywords" then searching "search" and "this" and "keywords"

I would like to retrieve subject's results on top and I would like to retrieve best results liking "search this keywords".My query sometimes works well sometimes not.

How should I write this query

Thanks..

A: 

I think you're saying that you want to perform multiple SQL queries against your database and then combine the results and set a "weighting" to a subject match over a content match.

select messageid, textstring, max(weight) from (
-- exact subject match
select messageid,  substr(subject,1,100) textstring, 100 weight
from mytable
    where subject='search this keywords'
union
-- partial subject match
select messageid,  substr(subject,1,100), 90 weight
from mytable
    where subject like '%search this keywords%'
union
select messageid,  substr(subject,1,100), 80 weight
from mytable
    where subject like '%search%'
union
select messageid,  substr(subject,1,100), 80 weight
from mytable
    where subject like '%this%'
union
select messageid,  substr(subject,1,100), 80 weight
from mytable
    where subject like '%keywords%'    
union
-- partial content match
select messageid,  substr(content,1,100), 70 weight
from mytable
    where content like '%search this keywords%'
union
select messageid,  substr(content,1,100), 60 weight
from mytable
    where content like '%search%'
union
select messageid,  substr(content,1,100), 60 weight
from mytable
    where content like '%this%'
union
select messageid, substr(content,1,100), 60 weight
from mytable
    where content like '%keywords%'    
)
group by 
    messageid, textstring,
blissapp
This seems solve my problem without substr(). But this method dublicates same rows.How can I remove dublicates.I have got primary key column like this { ID INT PRIMARY KEY IDENTITY }
Strange - the UNION operator should return a unique set (UNION ALL allows duplicates).Tempted to say use DISTINCT around the whole lot - i.e. SELECT DISTINCT * FROM ([all of the above]) ORDER BY WEIGHT DESC
blissapp
Probably it dublicates because of the weight column.I Tried distinct but I'm getting this error message"The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified."
Yes, you've got it right. To fix that error, just remove the order by that's already on the query above. However the correct fix is not DISTICT/ORDER BY rather it's to use a GROUP BY something like: SELECT messageid, text, max(weight) from ([query]) - i've edited it above to shown this change
blissapp
A: 

Try this

select * from (
Select sch, rank,
case when sch like '%search this keywords%' then 0
when sch like '%search%' then 1
when sch like '%this%' then 2
when sch like '%keywords%' then 3 end ord
from
(
select subject as sch, 1 as rank from mytable
union all
select content, 2 as rank from mytable
) as x
) as y
where ord is not null
order by rank, ord
msi77
A: 

Implementing a simple 'Full Text Search' like tables would be a way.

CREATE TABLE YourTable (id int, subject varchar(256), content varchar(8000))

CREATE TABLE Keywords (key_id int, keyw varchar(50), relevanceModifier float)
CREATE TABLE SubjectsKeywords (key_fk int, yourTable_fk int, quantity int)
CREATE TABLE ContentKeywords (key_fk int, yourTable_fk int, quantity int)

When you insert in YourTable, fire a trigger to:

Split subject and content columns by spaces, commas, etc into words.

Optionally, avoid "stop words" like "the", "they", "to", etc. This is called stemming if i'm not mistaken.

Each word should be inserted in tables SubjectsKeywords, ContentKeywords and Keywords.

Optionally, set relevanceModifier. A very simple criteria would be to use the string length.

Optionally, count each ocurrence and track it quantity fields.

Then your query would be like this:

select max(t.relevance), yourtable.id, MAX([subject]), MAX(content)
from
(
    /* exact match and 'contains' match */
    select 100 as relevance, id
    from YourTable 
    where [subject] like '%search this keywords%'  
    UNION 
    /* keyword match */
    select 70 as relevance, yt.id
    from YourTable as yt
    join SubjectsKeywords on id = yourTable_fk
    join Keywords as k on k.id = key_fk
    where keyw in ('search', 'this', 'keywords')
    UNION
    select 40 as relevance, id
    from YourTable 
    where [subject] like '%search this keywords%'
    UNION 
    select 10 as relevance, yt.id
    from YourTable as yt
    join ContentKeywords on yt.id = yourTable_fk
    join Keywords as k on k.id = key_fk
    where keyw in ('search', 'this', 'keywords')
) as T
join yourtable on t.id = yourtable.id
group by t.id
order by max(relevance) desc
, yourtable.id ASC /*So that the result will always be in the same order*/

Notes:

Trigger is a way to do it if you have little control of you application or if it is a maintenance nightmare.

Later it you could improve it by adding a soundex, so that, you can search even mispelled keywords.

RelevanceModifier, Quantity field can be use to calculate more relevant results.

As it may be fast enough, it may be usefull as an autocomplete feature for your application, in which case you'd like to limit the results to let say 256 at most.



I hope this gives you and idea, and so you decide what suits you best.