views:

157

answers:

1

is there a magic function or operator to ignore some tokens?

select to_tsvector('the quick. brown fox') @@ 'brown' -- returns true

select to_tsvector('the quick,brown fox') @@ 'brown' -- returns true

select to_tsvector('the quick.brown fox') @@ 'brown' -- returns false, should return true

select to_tsvector('the quick/brown fox') @@ 'brown' -- returns false, should return true
+1  A: 

I'm afraid that you are probably stuck. If you run your terms through ts_debug you will see that 'quick.brown' is parsed as a hostname and 'quick/brown' is parsed as filesystem path. The parser really isn't that clever sadly.

My only suggestion is that you preprocess your texts to convert these tokens to spaces. You could easily create a function in plpgsql to do that.

nicg=# select ts_debug('the quick.brown fox');
                              ts_debug
---------------------------------------------------------------------
 (asciiword,"Word, all ASCII",the,{english_stem},english_stem,{})
 (blank,"Space symbols"," ",{},,)
 (host,Host,quick.brown,{simple},simple,{quick.brown})
 (blank,"Space symbols"," ",{},,)
 (asciiword,"Word, all ASCII",fox,{english_stem},english_stem,{fox})
(5 rows)

As you can see from the above you don't get tokens for quick and brown

Nic Gibson