views:

216

answers:

9

Here I have a query like below:

SELECT field
FROM table
WHERE value IN ('val1', 'val2', 'val3', ... 'valn')

Let's say there are 2000 values inside the IN clause, the value doesn't exist in other table. Do you have any idea to speed up this operation?

The question is open to accept any kind of methods..

Thanks!

+2  A: 
  1. Create an index that covers 'field' and 'value'.

  2. Place those IN values in a temp table and join on it.

Mitch Wheat
Agree with 1. Do you have any info or links with performance stats for 2 - seems a bit non-intuitive? Any guidance as to the tilting point as to IN becomes inefficient vs insert to temp and then JOIN to temp? Thnx
nonnb
That is what I answer during the interview. The interviewer then comment this answer must have something to do afterward. I guess he is talking about concurrent access to table and housekeeping.
woak
A: 

If you already have an index on the value field and the values are not available in any tables to join or sub-select from, then I don't think there are any possibilities to optimize. In the special case where your values are really "val1", "val", ... you could use a like query which would uses an index to search on the prefix. But I asume that was just an example.

Jörn Horstmann
+1  A: 
SELECT field
FROM table
WHERE value IN SELECT somevalue from sometable

As far as i know, you will face another problem. That will be the limitation of 'IN' clause. Using this, you can avoid that and hopefully fasten your query

jancrot
A: 

Oracle query parsing and caching mechanism works better when you use bind variables. If your query may perform better by using them

SELECT field FROM table WHERE value IN (?,? ....) and Then assign values as needed.

It is much better to analyse performance of real execution using some tool like enterprise management console and decide of improvements. Creating index is probably the first step to do.

Storing potential values in another table and using suggestions from J Horstmann seems a correct idea. Please give it a try.

Jayan
A: 

More information is required for selecting better solution.

  1. Does query execute frequently?
  2. Are values val1, val2 fixed?
  3. How big is the table?

If query is executed frequently, values val1, val2 etc are fixed and table is large (having say 20,000 or more rows) then store all the values in a another table (say temp table) and join both tables on value field.

If table in the query below is large, there should be index on value field to improve the performance.

SELECT field FROM table WHERE value IN ('val1', 'val2', 'val3', ... 'valn')

Both the table should be analyzed.

The reason for better performance is that the optimizer will choose best join method depending on the characteristic of the table. If table in above query is very large, the join will be nested loop join and above table should have an index on the column val.

If table in above query is very small (say fewer then 200-300 rows) the new table (temp table) should have index on val column.

If both tables are of almost same size indexes will not help much.

Conclusion: Best solution depends on specific situation.

Vishnu Gupta
A: 

If none of the other suggestions work, and the query takes a long time, you can try running it in parallel.

  select /*+ parallel(table) */ field ...
jonearles
A: 

You can join a normal table with a memory table that is filled with the list of values.

I don't how to do that with Java exactly but I do know how to do this with C#. I think something similar should be possible with Java.

Read here: http://forums.oracle.com/forums/thread.jspa?threadID=892457&tstart=375

Let's use a collection of User Defined Types (UDT's). First create a table with 1 million rows:

create table employees (id number(10) not null primary key, name varchar2(100) );

insert into employees 
select level l, 'MyName'||to_char(level) 
from dual connect by level <= 1e6;

1000000 rows created

commit;

exec dbms_stats.gather_schema_stats(USER, cascade=>TRUE);

No we turn to the C# code:

Let's select employees with id's 3 and 4.

Collection type MDSYS.SDO_ELEM_INFO_ARRAY is used because if we use this already predefined Oracle type we don't have to define our own Oracle type. You can fill collection MDSYS.SDO_ELEM_INFO_ARRAY with max 1048576 numbers.

using Oracle.DataAccess.Client;
using Oracle.DataAccess.Types;

    [OracleCustomTypeMappingAttribute("MDSYS.SDO_ELEM_INFO_ARRAY")]
    public class NumberArrayFactory : IOracleArrayTypeFactory
    {
      public Array CreateArray(int numElems)
      {
        return new Decimal[numElems];
      }

      public Array CreateStatusArray(int numElems)
      {
        return null;
      }
    }


    private void Test()
    {
      OracleConnectionStringBuilder b = new OracleConnectionStringBuilder();
      b.UserID = "sna";
      b.Password = "sna";
      b.DataSource = "ora11";
      using (OracleConnection conn = new OracleConnection(b.ToString()))
      {
        conn.Open();
        using (OracleCommand comm = conn.CreateCommand())
        {
          comm.CommandText =
              @" select  /*+ cardinality(tab 10) */ *  " +
              @" from employees, table(:1) tab " +
              @" where employees.id = tab.column_value";

          OracleParameter p = new OracleParameter();
          p.OracleDbType = OracleDbType.Array;
          p.Direction = ParameterDirection.Input;
          p.UdtTypeName = "MDSYS.SDO_ELEM_INFO_ARRAY";
          p.Value = new Decimal[] { 3, 4 };

          comm.Parameters.Add(p);

          int numPersons = 0;
          using (OracleDataReader reader = comm.ExecuteReader())
          {
            while (reader.Read())
            {
              MessageBox.Show("Name " + reader[1].ToString());
              numPersons++;
            }
          }
          conn.Close();
        }
      }
    }

The index on employees.id isn't used when one omits hint /*+ cardinality(tab 10) */. This index is created by Oracle because id is the primary key column.

This means that you don't have to fill a temporary table. The list of vaues stays in ram and you join your table employees with this list of values in memory table(:1) tab.

(wateenmooiedag=TTT)

TTT
+1  A: 

This looks like the right way in Java: http://knol.google.com/k/oracle-passing-a-list-as-bind-variable#

It is similar to the C# solution. Your list of value stays in memory (no temporary table) and it will not be persisted to disk and you use a parameterized query so the query executer doesn't have to reparse every query. I have no tried it with java but I think that it will be fast.

TTT
A: 

Just rewrite you IN to exists. It will be faster.

Alex
Perhaps you could include an example?
Nick Pierpoint