views:

59

answers:

2

I have an Oracle database with all the "data", and a Solr index where all this data is indexed. Ideally, I want to be able to run queries like this:

select * from data_table where id in ([solr query results for 'search string']);

However, one key issue arises: Oracle WILL NOT allow more than 1000 items in the array of items in the "in" clause (BIG DEAL, as the list of objects I find is very often > 1000 and will usually be around the 50-200k items)

I have tried to work around this using a "split" function that will take a string of comma-separated values, and break them down into array items, but then I hit the 4000 char limit on the function parameter using SQL (PL/SQL is 32k chars, but it's still WAY too limiting for 80,000+ results in some cases)

I am also hitting performance issues using a WHERE IN (....), I am told that this causes a very slow query, even when the field referenced is an indexed field?

I've tried making recursive "OR"s for the 1000-item limit (aka: id in (1...1000 or (id in (1001....2000) or id in (2001....3000))) - and this works, but is very slow.

I am thinking that I should load the Solr Client JARs into Oracle, and write an Oracle Function in Java that will call solr and pipeline back the results as a list, so that I can do something like:

select * from data_table where id in (select * from table(runSolrQuery('my query text')));

This is proving quite hard, and I am not sure it's even possible.

Things that I can't do:

  • Store full data in Solr (security + storage limits)
  • User Solr as controller of pagination and ordering (this is why I am fetching data from the DB)

So I have to cook up a hybrid approach where Solr really act like the full-text search provider for Oracle. Help! Has anyone faced this?

A: 

I'm not a Solr expert, but I assume that you can get the Solr query results into a Java collection. Once you have that, you should be able to use that collection with JDBC. That avoids the limit of 1000 literal items because your IN list would be the result of a query, not a list of literal values.

Dominic Brooks has an example of using object collections with JDBC. You would do something like

Create a couple of types in Oracle

CREATE TYPE data_table_id_typ AS OBJECT (
  id NUMBER
);

CREATE TYPE data_table_id_arr AS TABLE OF data_table_id_typ;

In Java, you can then create an appropriate STRUCT array, populate this array from Solr, and then bind it to the SQL statement

SELECT *
  FROM data_table
 WHERE id IN (SELECT * FROM TABLE( CAST (? AS data_table_id_arr)))
Justin Cave
A: 

Instead of using a long BooleanQuery, you can use TermsFilter (works like RangeFilter, but the items doesn't have to be in sequence).

Like this (first fill your TermsFilter with terms):

TermsFilter termsFilter = new TermsFilter();

        // Loop through terms and add them to filter
        Term term = new Term("<field-name>", "<query>");
        termsFilter.addTerm(term);

then search the index like this:

DocList parentsList = null;
parentsList = searcher.getDocList(new MatchAllDocsQuery(),  searcher.convertFilter(termsFilter), null, 0, 1000);

Where searcher is SolrIndexSearcher (see java doc for more info on getDocList method): http://lucene.apache.org/solr/api/org/apache/solr/search/SolrIndexSearcher.html

mbonaci