views:

208

answers:

2

I have a text column in a table. We store XML in this column. Now I want to search for tags and values

Example data:

<bank>
  <name>Citi Bank</name>
  .....
  .....
/<bank>

I would like to run the following query:

select * from xxxx where to_tsvector('english',xml_column) @@ to_tsquery('<name>Citi Bank</name>')

This works fine but it also works for tags like name1 or no tag.

How do I have to setup my search in order for this to work so I get an exact match for the tag and value ?

+1  A: 

You could use the xpath function like this

select *
from xxx
where xpath(xml_column, 'bank/name/text()') = 'CitiBank';

BUT it won't use the full-text search index. You could use a subquery to find probable matches and avoid full scans, and the xpath expression for getting correct answers, or create a function index if the queries are going to be always the same.

Samuel
The xml_column is not of type xml, it's plain text. I have tried with xpath, but indexing does not work .... long story.I need to make this work with plain text search
cro
I don't think its doable only with text indexes. You could try the combined approach, using XMLPARSE to cast the text to xml so you can use xpath expressions AND the full text expression you already have, to avoid full scans.
Samuel
+1  A: 

You might want to reconsider storing XML in a database, instead you could look at inserting the data into related tables, since using XML is a poor replacement for a relational store. Even if you go with XML in database, use the XML type, not the TEXT type, and create an index like this (yes, basically you'd need an index per xpath expression):

CREATE INDEX my_funcidx ON my_table USING GIN ( CAST(xpath('/bank/name/text()', xmlfield) AS TEXT[]) );

then, query it like this:

SELECT * FROM my_table WHERE CAST(xpath('/bank/name/text()', xmlfield) AS TEXT[]) @> '{Citi Bank}'::TEXT[];

and this will use the index, as EXPLAIN will indicate.

The important part is the CASTing to TEXT[], as XML[], which the xpath function returns, isn't indexable by default.

MkV
Thanks for the example .... I tried it and it does not seem to use the index.as you state (yes, basically you'd need an index per xpath expression): is probably a show stopper.My idea now, use text column, search first with ts_tovector @@ ts_toquery, once I have a hit I load the xml into memory and perform an Xpath query in memory. first results are promising
cro
you really will be doing lots of different xpath queries (not just the same queries with different parameters)? I think this would benefit greatly from being broken up into relational tables, perhaps even on an insert or update trigger.That said, what was your index and the execution plan of your query based on my example.
MkV