views:

80

answers:

3

Can somebody advise idea, how to match user input (few words) to appropriate tags in system (each of them also 1 or N words)?

Here is sample to demonstrate problem: I have a Tags, assigned to objects. For example (tags are separated by COMA, but in real life I have relation to table)

Object                  Tags
Earth                   World, reality
World of warcraft 3     World Of warcraft, virtual reality
quake                   game, virtual

I would like to get following:

User Enter 'World': result is 'Earth'

User Enter 'World of warcraft': result is 'world of warcraft 3'

That was simple, exact search. But:

User Enter 'game world': results should be search by two tags - 'earth', 'quake'

User enter 'virtual reality': returns all 3 records

User enter 'reality virtual': Earth, quake

I am using t-sql for search, full-text-search is enabled and used to also find a keywords in main text. C# is middle tier. But I prefer to have solution on t-sql level.

UPDATE 1 First what I am going to do, is to disallow spaces in tags, like on stackoverflow. Any other ideas are upreciated.

A: 

It is NOT a good idea to put such complex business logic into your SQL code! Put it in the middle tier and if you're worried about performance use some caching mechanism.

Manu
good point. However, I afraid that it will be to slow to match all combinations of keywords to tags in C# level.
Sergey Osypchuk
The opposite is true. Unless you have a MASSIVE amount of keywords (>100MB) you will not require Full Text indexing. And you could always implement c# indexing with nested hashtables/dictionnaries.
Manu
Although, unless your database (which is unique) you can actually have multiple servers using the C# code... there is natural parallelism here!
Matthieu M.
+1  A: 

You require a split function to split the tags in the search string, and then try to match these in the tags

FUNCTION [dbo].[SplitString]
(
     @String VARCHAR(8000) ,
     @Delimiter VARCHAR(10)
)
RETURNS @RetTable TABLE(
     String varchar(1000)
)
AS 
BEGIN
    DECLARE @i INT ,
      @j INT
    SELECT  @i = 1
    WHILE @i <= LEN(@String)
    BEGIN
     SELECT @j = CHARINDEX(@Delimiter, @String, @i + 1)
     IF @j = 0
     BEGIN
      SELECT @j = LEN(@String) + 1
     END
     INSERT @RetTable SELECT LTRIM(RTRIM(SUBSTRING(@String, @i, @j - @i)))
     SELECT @i = @j + LEN(@Delimiter)
    END
    RETURN
END


DECLARE @String VARCHAR(8000) ,
     @Delimiter VARCHAR(10)
DECLARE @RetTable TABLE(
     String varchar(1000)
)

SELECT  @String = 'world of ',
     @Delimiter = ' '

--split FUNCTION that returns a table of tags to match
    DECLARE @i INT ,
      @j INT
    SELECT  @i = 1
    WHILE @i <= LEN(@String)
    BEGIN
     PRINT @i
     SELECT @j = CHARINDEX(@Delimiter, @String, @i + 1)
     IF @j = 0
     BEGIN
      SELECT @j = LEN(@String) + 1
     END
     INSERT @RetTable SELECT LTRIM(RTRIM(SUBSTRING(@String, @i, @j - @i)))

     SELECT @i = @j + LEN(@Delimiter)
    END

SELECT * FROM @RetTable
--split FUNCTION that returns a table of tags to match

DECLARE @Table TABLE(
     Objects VARCHAR(MAX),
     Tags VARCHAR(MAX)
)

INSERT INTO @Table (Objects,Tags) SELECT 'Earth', 'World,reality'
INSERT INTO @Table (Objects,Tags) SELECT 'World of warcraft 3', 'World Of warcraft,virtual,reality'
INSERT INTO @Table (Objects,Tags) SELECT 'quake', 'game,virtual'


SELECT  DISTINCT 
     t.* 
FROM    @Table t,
     @RetTable r
WHERE   Tags LIKE '%' + String + '%,%'
OR   Tags LIKE '%,%' + String + '%,%'
OR   Tags LIKE '%,%' + String + '%'

something like that.

astander
+2  A: 

You might want to investigate using the Velocity caching engine, as it has quite rich support for tagging (GetObjectsByTag, GetObjectsByAllTags, GetObjectsByAnyTag) and all the hard work has been done for you! All you have to do is load your objects into the cache with appropriate tags.

PhilPursglove
Thanks for link, I wasn't aware of it. I will definitely examine it.However, this is only part of my search problem, and there is low chance that I can use 3rd party.
Sergey Osypchuk
Depends how you interpret 3rd party. If you're already using C# then it's just another component from Microsoft.
PhilPursglove