ansaurus

Question

How to improve performance by processing database results in parallel?

Answer 1

A:

I think you're right that the cost of doing the NTILE is outweighing the saving of the parallelism.

You need to use something that will split the query sets into clearly separated sets.

If your queries are returning less than 15% of the total data (approximately) then breaking down the tables on an index (either an indexed field, or functional index) is probably your best starting point.

Example : Presuming your data has a numeric pseudo-key on each row, create a functional index on MOD(Id,4) - this would give you an Index based version of your NTILE approach. (I don't think you can have a functional index on an NTILE).

This specific approach is probably counter-productive - you would be getting data from the same blocks in different threads, so potentially increasing I/O (depends on memory).

The way that Oracle parallel query tends to do it - provided you want to process over 15% of the data in the table - is to simply break the table into N physical chunks (using the rowid) and then run N 'full scans' on those chunks.

I'm not sure if you can replicate this approach from the front-end. Splitting on a key id adds in the cost of going through the index to each row.

What you probably want is something that splits the table by something other than the key, or if you split on key, split it by ranges rather than the NTILE approach.

JulesLt 2010-07-06 09:25:39

You can't have a functional index on `NTILE`, because it's not deterministic. You can `NTILE(n) over (order by ROWID)`, however. That would functionally break the data set into discrete physical chunks. I suspect the OP should get a SQL Trace with waits to see if adding roll-you-own parallelism has saturated the I/O subsystem.

Adam Musch 2010-07-06 13:00:50

Done a quick test and it looks like doing an NTILE on ROWID (and then restricting the query by the NTILE result) will still trigger a full table scan on each thread - it doesn't look like it does any 'magic' to quickly group the total set into even chunks. On my test system NTILE by primary key has a slightly lower cost, but as always mileage may vary. The functional index on MOD(Id,10) came out a lot cheaper for me; I think better results could be achieved by using a function that results in a better ration of data in same blocks being in the same 'chunk' but probably a good start.

JulesLt 2010-07-07 09:42:11

Answer 2

A:

I use OracleCommand.Fetchsize to improve perfomance on large Queries.

cmd.FetchSize = &H100000  '1Mb
Dim Rdr = cmd.ExecuteReader

Some time ago, I use Async Readers for get Blob Data. But to use Async Reader you need maintain an array with each async Result an loop until last Reader ends.

   Public Shared Function FromBlob(ByVal Id As String, ByVal Rv As String, ByVal cn As OracleConnection) As Proyecto
     Dim n As Integer, Prj As Proyecto = Nothing
     Dim Bf(2)() As Byte, arrAr(2) As IAsyncResult 'Para proceso asíncrono

     Dim Cmd As New OracleCommand( _
         "Select rv,fecha,Datos From Proyectos Where Id=:Id and Rv in (:Rv,'Av','Est')", cn)
     Cmd.BindByName = True
     Cmd.Parameters.Add("Id", OracleDbType.Varchar2, Id, ParameterDirection.Input)
     Cmd.Parameters.Add("Rv", OracleDbType.Varchar2, Rv, ParameterDirection.Input)
     If Rv Is Nothing Then Prj = Proyecto.Actprj
     Try
        Using Rdr As OracleDataReader = Cmd.ExecuteReader
            Do Until Rdr.Read = False
                Dim rv1 As String = Rdr.GetString(0)
                Select Case rv1
                    Case "Av" : n = 1   'Avance TND
                    Case "Est" : n = 2  'Datos Seguimiento Estudio Seguridad
                    Case Else : n = 0
                End Select
                If Rdr.IsDBNull(2) = False Then
                   Dim Blob As OracleBlob = Rdr.GetOracleBlob(2)
                   Dim Buffer(CInt(Blob.Length)) As Byte
                   Bf(n) = Buffer
                   arrAr(n) = Blob.BeginRead(Buffer, 0, Buffer.Length, Nothing, Blob)
                End If
            Loop
            If Bf(0) Is Nothing AndAlso Prj Is Nothing Then _
               MessageBox.Show("Fallo al cargar proyecto") : Return Nothing
            For n = 0 To Bf.Length - 1
                Dim ar As IAsyncResult = arrAr(n)
                If ar IsNot Nothing AndAlso ar.AsyncWaitHandle.WaitOne() Then
                   Dim blob As OracleBlob = DirectCast(ar.AsyncState, OracleBlob)
                   blob.EndRead(ar)
                   blob.Dispose()
                   If ar.IsCompleted Then
                      Using rd As New BinReader(New MemoryStream(Bf(n)))
                          If n = 0 Then
                             Prj = New Proyecto(rd, False)
                          Else
                             Dim entry = Proyecto.Entry.FromLob(rd), Index = Prj.IndexOf(entry)
                             If Index < 0 Then Prj.Add(entry) Else Prj(Index) = entry
                          End If
                      End Using
                   End If
                End If
            Next
        End Using
        Catch ex As Exception
            MessageBox.Show(ex.Message)
     End Try
     Return Prj
  End Function

x77 2010-07-06 10:39:29

Note: When I use BeginRead (Async), I do´nt use multiTask on my app, Oracle executes simultaneus readers over assigned buffers (Bf). I only need verify each IAsyncResult for task completion. In code sample for three readers the result is about 50% fast, but this is for reduce roundtrip, not for Data Size.

x77 2010-07-06 10:58:45

Answer 3

A:

You can use Ref Cursor with Oracle to execute some Sql with one OracleCommand:

  Dim cmd As New OracleCommand("Begin " _
  & "Open :1 for Select T.CODTRA,SIM,JLA CAL,SUP,RESP,SERV,SubStr(Aparato,1,3) SIS,PERS,(nvl(DUR,0) * 60) as Dur,t.DESTRA,g.DesTra Destrae,OBS from " & TraRec & " T, Trarec_Gee g where T.codtra <> 'RV' and T.Codtra=G.Codtra(+);" _
  & "Open :2 for Select Red,descr from Redes;" _
  & "Open :3 for Select * from Tr_Redes;" _
  & "Open :4 for Select CODTRA,T_COND,COND,DEMORA * 60 as DEMORA from " & TrCondic _
  & ";end;", cn)

  For n = 0 To 3 : cmd.Parameters.Add(Nothing, OracleDbType.RefCursor, ParameterDirection.Output) : Next
  Dim da As New OracleDataAdapter(cmd)
  da.Fill(0, 0, ds.Tnd, ds.Redes, ds.TrRedes, ds.TrCondic)

Note: Da.Fill(0, 0, T1, T2 ...) is a Oracle especific function to retrieve many tables on a single statement.

x77 2010-07-06 11:26:34

Answer 4

A:

Ultimately it has turned out to be an IO bound problem. I've been able to achieve perf improvements by doing the IO asynchronously. NTILE on ROWID does what I wanted but so far it hasn't helped because the problem is IO bound.

Neil 2010-07-14 09:47:12

ansaurus

tags:

views:

answers:

How to improve performance by processing database results in parallel?

related questions