views:

12

answers:

1

Hi fellas,

I'm developing a simple articles website in brazilian portuguese language. The search feature is based on a full-text search, but it isn't returning expected results.

I made this on postgresql. Here is the simplified table:

Artigos
-id
-title -- article title
-intro -- article introduction
-content -- article body
-publishdate -- date of launch
-artigosts -- this will work as our fts index.

After creating the table, I ran:

UPDATE artigos SET artigosts = 
setweight(to_tsvector('pg_catalog.portuguese', coalesce(title,'')), 'A') || 
setweight(to_tsvector('pg_catalog.portuguese', coalesce(intro,'')), 'B') ||
setweight(to_tsvector('pg_catalog.portuguese', coalesce(content,'')), 'C');

CREATE INDEX artigosts_idx ON artigos USING gist (artigosts);

CREATE TRIGGER artigosts_tg 
BEFORE INSERT OR UPDATE ON artigos 
FOR EACH ROW EXECUTE PROCEDURE 
  tsvector_update_trigger('artigosts', 'pg_catalog.portuguese', 'title', 'intro', 'content');

Yes, I intend to use simple weightning on the searches. Made a index to speed up, a trigger, so I can insert and update without worry about remaking index and so on.

Well, per my understanding, it is everything fine. But results aren't. A simple example.

Let's say I have "... banco de dados ... no banco ... " as one article content. When I do:

SELECT title, intro, content FROM artigos WHERE plainto_tsquery('banco de dados') @@ artigosts;

It returns an empty set. I checked the ts_vector column and saw the predicates 'banc' and 'dad'. But I still can't understand why it does not return the row containing the mentioned article.

Can someone bring a light into this question?

A: 

The reason for this is probably because your default dictionary is set to english. Try the following queries to determine if that is actually the case.

SELECT * FROM ts_debug('banco de dados');

This query will show how the dictionary parses up your search phrase. It should provide the lexemes "banco", "de", and "dado". So what you're actually searching for will not exist in the index and you will receive 0 results.

Now try this:

SELECT * FROM ts_debug('portuguese', 'banco de dados');

It should return the lexemes that exist in the index, "banc" and "dad". If that is the case then you can simply change your search query to get the appropriate result.

SELECT title, intro, content FROM artigos WHERE plainto_tsquery('portuguese', 'banco de dados') @@ artigosts;
thetaiko
Thx for the reply. Yeah, I realized the main dictionary is english based. And Portuguese conf for dictionaries, despite existing, sucks. I could achieve the best results using SIMPLE config. The major problem is the use of special characters, like áéóíúçãõ, those are heavily used on portuguese. And some lexemes aren't parsed alright, but most lexemes are parsed right, so as of now it satisfies partially my needs.
Dave
@Dave - I haven't tried this with any language other than english, but you could always use an ispell dictionary with a fallback onto the snowballer. Portuguese dictionaries are available online: http://lasr.cs.ucla.edu/geoff/ispell-dictionaries.html#Portuguese-dicts
thetaiko
Thx for the quick reply, but I wish I could change those settings on my host. Unfortunately changing dict properties, fallbacks, files and so on is not an option for me on that particular server. But again, thx for the tips.
Dave