views:

143

answers:

4

Is there a way to parse a Google search string to a table variable in T-SQL?

By Google search string I mean, including the plus sign (require), minus sign (exclude), and exact phrase (double quotes) operators.

For example the following search string:

one -two +three "four five" -"six seven" +"eight nine" "ten eleven twelve"
Would be parsed to a table variable that I could use to generate a T-SQL where clause:
OPERATOR    STRING
            one
-           two
+           three
            four five
-           six seven
+           eight nine
            ten eleven twelve

Thanks!

+1  A: 

You need basically deduce a grammar from that format and write a quick parser for it. There should be plenty examples on the web. I definitely remember the C++ book by B. Stroustrup contains an example of a simple calculator. You can have a look at it.

Developer Art
A: 

If you are in .net the Irony SQL FTS example may be able to get you started.You could adapt the splitter to create your data in the database

http://irony.codeplex.com/

u07ch
+2  A: 

Well parsers are written the same way in any language, using a state machine. Nothing like writting a short parser in the morning to get your sinapses lubricated:

declare @s varchar(max);
declare @t table (operator char(1) null, token varchar(max));

set @s = 'one -two +three "four five" -"six seven" +"eight nine" "ten eleven twelve"';

declare @state varchar(100);
declare @operator char(1);
declare @token varchar(max);
declare @c char(1);
declare @i int;

set @state = 'start';
set @i = -1;

while (1=1)
begin
  set @i = @i + 1;
  if (@i > len(@s))
    break;
  set @c = substring(@s, @i, 1);
  if (@state = 'start')
  begin
    if @c in ('-', '+')
    begin
      set @operator = @c;
      set @token = '';
      set @state = 'operator';
      continue;
    end
    else if @c = '"'
    begin
      set @operator = null;
      set @token = '';
      set @state = 'quote';
      continue;
    end
    else if (@c between 'a' and 'Z')
      or (@c between '0' and '9')
    begin
      set @operator = null;
      set @token = @c;
      set @state = 'token';
      continue;
    end
    else
      continue; -- ignore noise
  end
  else if @state = 'token'
  begin
    if (@c between 'a' and 'Z')
      or (@c between '0' and '9')
    begin
      set @token = @token + @c;
      continue;
    end
    else
    begin
      insert into @t (operator, token)
        values (@operator, @token); 
      set @state = 'start';
      continue;
    end
  end
  else if @state = 'quote'
  begin
    if (@c != '"')
    begin
      set @token = @token+@c;
      continue;
    end
    else
    begin
      insert into @t (operator, token)
        values (@operator, @token); 
      set @state = 'start';
      continue;
    end
  end
  else if @state = 'operator'
  begin
    if @c = '"'
    begin
      set @token = '';
      set @state = 'quote';
      continue;
    end
    else if (@c between 'a' and 'Z')
      or (@c between '0' and '9')
    begin
      set @token = @c;
      set @state = 'token';
      continue;
    end
    else 
    begin
      -- consider raising error here, invalid char after operator +/-
      set @state = 'start';
      continue;
    end
  end
  else
    raiserror ('Unexpected state %s', 16,2, @state);
end
if @state = 'token'
begin
  insert into @t (operator, token)
        values (@operator, @token); 
end
else if @state != 'start'
begin
  raiserror ('Incorrectly formatted string, must not end in state %s', 16, 1, @state);
end

select * from @t;
Remus Rusanu
Thanks! I created a similar solution as a Table-Value Function. I was hoping not to have to, but your solution made me realize it had to be logic.
Kuyenda
+1  A: 

I was inspired by Remus to come up with my own solution as a table-valued function.

CREATE FUNCTION [dbo].[PARSE_SEARCH_STRING]
( 
 @search_string NVARCHAR(MAX) 
)
RETURNS @table_token TABLE (
 operator CHAR(1) NULL,
 token NVARCHAR(MAX)
)
AS
BEGIN

 DECLARE @token NVARCHAR(MAX)
 DECLARE @operator CHAR(1)
 DECLARE @remainder NVARCHAR(MAX)
 DECLARE @length INTEGER

 SET @remainder = LTRIM(RTRIM(@search_string))

 WHILE LEN(@remainder) > 0
 BEGIN

  IF SUBSTRING(@remainder, 1, 1) = '-' OR SUBSTRING(@remainder, 1, 1) = '+' OR SUBSTRING(@remainder, 1, 1) = '='
  BEGIN
   SET @operator = LTRIM(RTRIM(SUBSTRING(@remainder, 1, 1)))
   SET @remainder = LTRIM(RTRIM(SUBSTRING(@remainder, 2, LEN(@remainder) - 1)))
  END
  ELSE
   SET @operator = NULL

  IF SUBSTRING(@remainder, 1, 1) = '"'
  BEGIN
   SET @length = CHARINDEX('"', @remainder, 2) - 2
   IF NOT @length > 0 SET @length = LEN(@remainder)
   SET @token = LTRIM(RTRIM(SUBSTRING(@remainder, 2, @length)))
   SET @remainder = LTRIM(RTRIM(SUBSTRING(@remainder, 3 + @length, LEN(@remainder) - @length + 2)))
  END
  ELSE
  BEGIN
   SET @length = CHARINDEX(' ', @remainder, 1) - 1
   IF NOT @length > 0 SET @length = LEN(@remainder)
   SET @token = LTRIM(RTRIM(SUBSTRING(@remainder, 1, @length)))
   SET @remainder = LTRIM(RTRIM(SUBSTRING(@remainder, 1+ @length, LEN(@remainder) - @length + 2)))
  END

  IF NOT @token = ''
  BEGIN
   IF NOT EXISTS ( 
    SELECT 1 
    FROM @table_token 
    WHERE operator = @operator
    AND token = @token
   )
   INSERT @table_token ( operator, token ) VALUES ( @operator, @token )
  END

 END

 RETURN

END
Kuyenda