views:

289

answers:

2

I've a problem with a query generated by Hibernate that do not uses an index. Access to database is made from Java using JTDS and server version is SQL Server 2005, latest service pack.

The field is nullable and is a foreign key that, in some specific scenarios, could be completely null, column is indexed via a not clustered index but the index is never used when the column is entirely null, creating a large number of full table scans and performance issues.

The situation could be verified also using the standard query analyzer with the following SQL code:

Create table and indexes

CREATE TABLE [dbo].[TestNulls](
    [PK] [varchar](36) NOT NULL,
    [DATA] [varchar](36) NULL,
    [DATANULL] [varchar](36) NULL,
 CONSTRAINT [PK_TestNulls] PRIMARY KEY NONCLUSTERED
(
[PK] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF,     
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IDX_DATA] ON [dbo].[TestNulls]
(
[DATA] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, 
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, 
ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IDX_DATANULL] ON [dbo].[TestNulls]
(
[DATANULL] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, 
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON,     
ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

Fill it with some random data using the newid function

declare @i as int
set @i = 0
while (@i < 500000)
begin
  set nocount on

  insert into TestNulls values(NEWID(), NEWID(), null)
  insert into TestNulls values(NEWID(), null, null)
  insert into TestNulls values(NEWID(), null, null)           
  set @i = (@i + 1)
  set nocount on
end;

This query perform a full table scan

declare @p varchar(36)
set @p = NEWID()
select PK, DATA, DATANULL from TestNulls
where DATANULL = @p

If I complete the query with a "and DATANULL IS NOT NULL" the query now uses the index.

Help needed:

  • How can I force the JTDS/Hibernate combination to use the index (the sendStringParametersAsUnicode is already set to false by default)?
  • Is there a way to append "and column is not null" for all hibernate queries that uses a nullable field?
  • Any explanation about this behavior?

Regards Massimo

+1  A: 

Hello,

1) I think, we should avoid NULL values. Just use DEFAULT and place some {00000-0000-000...} as NULL value. Your data filling script generates too many nulls values, so selectivity of values of this field is very low. I think SQL Server will choose to scan then use index in this case (SQL Server automaticly chooses to use or does not use index itself). And it makes sence. You should analyse your REAL data. Any way you can force it to just use some index. You can create stored procedure to sql server and then query it from hibernate, for example, or command hibernate to use custom query to request data (I think, it is possible) and add table hint to your query to force using some index:

INDEX ( index_val [ ,...n ] ):

select PK, DATA, DATANULL from TestNulls WITH INDEX(IDX_DATANULL)

The selectivity is the "number of rows" / "cardinality", so if you have 10K customers, and search for all "female", you have to consider that the search would return 10K/2 = 5K rows, so a very "bad" selectivity.

Luck.

igor
Problem in you approach is that we're using Hibernete to generate queries and so we do not want to used SP. Beside that SQL2008 do not have this problem and adding is not null solves it, so we're looking for a simple way to force the "is not null" part in the sql query from hibernate
massimogentilini
What about this:Criteria criteria = session.createCriteria(TestNulls.class);criteria.add(Expression.isNotNull("DATANULL"));
igor
A: 

You are using a table with no clustered index ("heap table", as it is called), that is generally not very efficient for SELECTs, because any meaningful query requires either a bookmark lookup or a full table scan.

So, to use the index the server will have to: 1) find the given values in the index and retrieve corresponding Row IDs, 2) retrive the rows by the IDs and return the data.

Given the nature of your data, the optimizer "thinks" full scan is mor efficient.

I'd suggest you to try:

  1. Rebuilding the statistics on the table. Outdated stats could lead optimizer to wrong decisions.
  2. Force using index via a hint. Do not forget to test whether it is really faster on your actual data (sometimes optimizer happens to know better than you).
  3. Create a covering index for this query by adding some data (it will make inserts/updates somewhat slower, so you should consider the overall impact on the system):

    CREATE INDEX IDX_DATANULL_FULL ON TestNulls (DATANULL) INCLUDE (PK, DATA)

VladV