views:

369

answers:

3

I have a 800MB MS Access database that I migrated to SQLite. The structure of the database is as follows (the SQLite database, after migration, is around 330MB):

The table Occurrence has 1,600,000 records. The table looks like:

CREATE TABLE Occurrence 
(
SimulationID  INTEGER,    SimRunID   INTEGER,    OccurrenceID   INTEGER,
OccurrenceTypeID    INTEGER,    Period    INTEGER,    HasSucceeded    BOOL, 
PRIMARY KEY (SimulationID,  SimRunID,   OccurrenceID)
)

It has the following indexes:

CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence" ("HasSucceeded" ASC)

CREATE INDEX "Occurrence_OccurrenceID_idx" ON "Occurrence" ("OccurrenceID" ASC)

CREATE INDEX "Occurrence_SimRunID_idx" ON "Occurrence" ("SimRunID" ASC)

CREATE INDEX "Occurrence_SimulationID_idx" ON "Occurrence" ("SimulationID" ASC)

The table OccurrenceParticipant has 3,400,000 records. The table looks like:

CREATE TABLE OccurrenceParticipant 
(
SimulationID    INTEGER,     SimRunID    INTEGER,    OccurrenceID     INTEGER,
RoleTypeID     INTEGER,     ParticipantID    INTEGER
)

It has the following indexes:

CREATE INDEX "OccurrenceParticipant_OccurrenceID_idx" ON "OccurrenceParticipant" ("OccurrenceID" ASC)

CREATE INDEX "OccurrenceParticipant_ParticipantID_idx" ON "OccurrenceParticipant" ("ParticipantID" ASC)

CREATE INDEX "OccurrenceParticipant_RoleType_idx" ON "OccurrenceParticipant" ("RoleTypeID" ASC)

CREATE INDEX "OccurrenceParticipant_SimRunID_idx" ON "OccurrenceParticipant" ("SimRunID" ASC)

CREATE INDEX "OccurrenceParticipant_SimulationID_idx" ON "OccurrenceParticipant" ("SimulationID" ASC)

The table InitialParticipant has 130 records. The structure of the table is

CREATE TABLE InitialParticipant 
(
ParticipantID    INTEGER  PRIMARY KEY,     ParticipantTypeID    INTEGER,
ParticipantGroupID     INTEGER
)

The table has the following indexes:

CREATE INDEX "initialpart_participantTypeID_idx" ON "InitialParticipant" ("ParticipantGroupID" ASC)

CREATE INDEX "initialpart_ParticipantID_idx" ON "InitialParticipant" ("ParticipantID" ASC)

The table ParticipantGroup has 22 records. It looks like

CREATE TABLE ParticipantGroup   (
ParticipantGroupID    INTEGER,    ParticipantGroupTypeID     INTEGER,
Description    varchar (50),      PRIMARY KEY(  ParticipantGroupID  )
)

The table has the following index: CREATE INDEX "ParticipantGroup_ParticipantGroupID_idx" ON "ParticipantGroup" ("ParticipantGroupID" ASC)

The table tmpSimArgs has 18 records. It has the following structure:

CREATE TABLE tmpSimArgs (SimulationID varchar, SimRunID int(10))

And the following indexes:

CREATE INDEX tmpSimArgs_SimRunID_idx ON tmpSimArgs(SimRunID ASC)

CREATE INDEX tmpSimArgs_SimulationID_idx ON tmpSimArgs(SimulationID ASC)

The table ‘tmpPartArgs’ has 80 records. It has the below structure:

CREATE TABLE tmpPartArgs(participantID INT)

And the below index:

CREATE INDEX tmpPartArgs_participantID_idx ON tmpPartArgs(participantID ASC)

I have a query that involves multiple INNER JOINs and the problem I am facing is the Access version of the query takes about a second whereas the SQLite version of the same query takes 10 seconds (about 10 times slow!) It is impossible for me to migrate back to Access and SQLite is my only option.

I am new to writing database queries hence these queries might look stupid, so please advise on anything you see faulty or kid-dish.

The query in Access is (the entire query takes 1 second to execute):

SELECT ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, Occurrence.Period, Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID FROM 
( 
   ParticipantGroup INNER JOIN InitialParticipant ON ParticipantGroup.ParticipantGroupID =  InitialParticipant.ParticipantGroupID
) INNER JOIN 
(
tmpPartArgs INNER JOIN 
  (
     (
        tmpSimArgs INNER JOIN Occurrence ON (tmpSimArgs.SimRunID = Occurrence.SimRunID)   AND (tmpSimArgs.SimulationID = Occurrence.SimulationID)
     ) INNER JOIN OccurrenceParticipant ON (Occurrence.OccurrenceID =    OccurrenceParticipant.OccurrenceID) AND (Occurrence.SimRunID = OccurrenceParticipant.SimRunID) AND (Occurrence.SimulationID = OccurrenceParticipant.SimulationID)
  ) ON tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID
) ON InitialParticipant.ParticipantID = OccurrenceParticipant.ParticipantID WHERE (((OccurrenceParticipant.RoleTypeID)=52 Or (OccurrenceParticipant.RoleTypeID)=49)) AND Occurrence.HasSucceeded = True GROUP BY ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, Occurrence.Period;

The SQLite query is as follows (this query takes around 10 seconds):

SELECT ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period, Count(ij2.occpParticipantID) AS CountOfParticipantID FROM 
(
   SELECT ip.ParticipantGroupID AS ipParticipantGroupID, ip.ParticipantID AS ipParticipantID, ip.ParticipantTypeID, pg.ParticipantGroupID AS pgParticipantGroupID, pg.ParticipantGroupTypeID, pg.Description FROM ParticipantGroup as pg INNER JOIN InitialParticipant AS ip ON pg.ParticipantGroupID = ip.ParticipantGroupID
) AS ij1 INNER JOIN 
(
   SELECT tpa.participantID AS tpaParticipantID, ij3.* FROM tmpPartArgs AS tpa INNER JOIN 
     (
       SELECT ij4.*, occp.SimulationID as occpSimulationID, occp.SimRunID AS occpSimRunID, occp.OccurrenceID AS occpOccurrenceID, occp.ParticipantID AS occpParticipantID, occp.RoleTypeID FROM 
          (
              SELECT tsa.SimulationID AS tsaSimulationID, tsa.SimRunID AS tsaSimRunID, occ.SimulationID AS occSimulationID, occ.SimRunID AS occSimRunID, occ.OccurrenceID AS occOccurrenceID, occ.OccurrenceTypeID, occ.Period, occ.HasSucceeded FROM tmpSimArgs AS tsa INNER JOIN Occurrence AS occ ON (tsa.SimRunID = occ.SimRunID) AND (tsa.SimulationID = occ.SimulationID)
          ) AS ij4 INNER JOIN OccurrenceParticipant AS occp ON (occOccurrenceID =      occpOccurrenceID) AND (occSimRunID = occpSimRunID) AND (occSimulationID = occpSimulationID)
    ) AS ij3 ON tpa.participantID = ij3.occpParticipantID
) AS ij2 ON ij1.ipParticipantID = ij2.occpParticipantID WHERE (((ij2.RoleTypeID)=52 Or (ij2.RoleTypeID)=49)) AND ij2.HasSucceeded = 1 GROUP BY ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period;   

I don’t know what I am doing wrong here. I have all the indexes but I thinking I am missing declaring some key index that will do the trick for me. The interesting thing is before migration my ‘research’ on SQLite showed that SQLite is faster, smaller and better in all aspects than Access. But I cant seem to get SQLite work faster than Access in terms of querying. I reiterate that I am new to SQLite and obviously do not have much idea as well as experience so if any learned soul could help me out with this, it will be much appreciated.

A: 

I have presented a smaller scaled down version of my query. Hope this is more clear and legible than my earlier one.

SELECT5 * FROM 
(
SELECT4 FROM ParticipantGroup as pg INNER JOIN InitialParticipant AS ip ON pg.ParticipantGroupID = ip.ParticipantGroupID
) AS ij1 INNER JOIN 
(
   SELECT3 * FROM tmpPartArgs AS tpa INNER JOIN 
      (
          SELECT2 * FROM 
              (
                  SELECT1 * FROM tmpSimArgs AS tsa INNER JOIN Occurrence AS occ ON (tsa.SimRunID = occ.SimRunID) AND (tsa.SimulationID = occ.SimulationID)
              ) AS ij4 INNER JOIN OccurrenceParticipant AS occp ON (occOccurrenceID =      occpOccurrenceID) AND (occSimRunID = occpSimRunID) AND (occSimulationID = occpSimulationID)
      ) AS ij3 ON tpa.participantID = ij3.occpParticipantID
) AS ij2 ON ij1.ipParticipantID = ij2.occpParticipantID WHERE (((ij2.RoleTypeID)=52 Or (ij2.RoleTypeID)=49)) AND ij2.HasSucceeded = 1

The application that I am working on is a Simulation application and in order to understand the context of the above query I thought it necessary to give a brief explanation of the application. Let us assume there is a planet with some initial resources and living agents. The planet is allowed to exist for 1000 years and the actions performed by the agents are monitored and stored in the database. After 1000 years the planet is destroyed and again re-created with the same set of initial resources and living agents as the first time. This (the creation and destruction) is repeated 18 times and all the actions of the agents performed during those 1000 years are stored in the database. Thus our entire experiment consists of 18 re-creations which is termed as the ‘Simulation’. Each of the 18 times the planet is recreated is termed as a run and each of the 1000 years of a run is called a period. So a ‘Simulation’ consists of 18 runs and each run consists of 1000 periods. At the start of each run, we assign the ‘Simulation’ an initial set of knowledge items and dynamic agents that interact with each other and the items. A knowledge item is stored by an agent inside a knowledge store. The knowledge store is also considered to be a participating entity in our Simulation. But this concept (regarding knowledge stores) is not important. I have tried to be detailed about every SELECT statement and the tables involved.

SELECT1: I think this query could be replaced by just the table ‘Occurrence’, since it does nothing much. The table Occurrence stores the different actions taken by the agents, in each period of every simulation run of a particular ‘Simulation’. Normally each ‘Simulation’ consists of 18 runs. And each run consists of a 1000 periods. An agent is allowed to take an action in every period of every run in the ‘Simulation’. But the Occurrence table does not store any details about the agents that perform the actions. The Occurrence table might store data related to multiple ‘Simulations’.

SELECT2: This query simply returns the details of actions performed in every period of every run of a ‘Simulation’ along with the details of all participants of the ‘Simulation’ like their respective ParticipantIDs. The OccurrenceParticipant table stores records for every participating entity of the Simulation and that includes agents, knowledge stores, knowledge items, etc.

SELECT3: This query returns only those records from the pseudo table ij3 that are due to agents and knowledge items. All records in ij3 concerning knowledge items will be filtered out.

SELECT4: This query attaches the ‘Description’ field to every record of ‘InitialParticipant’. Please note that the column ‘Description’ is an Output column of the entire query. The table InitialParticipant contains a record for every agent and every knowledge item that is initially assigned to the ‘Simulation’

SELECT5: This final query returns all records from the pseudo table ij2 for which the RoleType of the participating entity (which may either be an agent or a knowledge item) is 49 or 52.

Saswati De
why not just edit your question instead of this 'answer'?
Ben Laan
+2  A: 

I have reformatting your code (using my home-brew sql formatter) to hopefully make it easier for others to read..

Reformatted Query:

SELECT
    ij1.Description,
    ij2.occSimulationID,
    ij2.occSimRunID,
    ij2.Period,
    Count(ij2.occpParticipantID) AS CountOfParticipantID

FROM (

    SELECT
        ip.ParticipantGroupID AS ipParticipantGroupID,
        ip.ParticipantID AS ipParticipantID,
        ip.ParticipantTypeID,
        pg.ParticipantGroupID AS pgParticipantGroupID,
        pg.ParticipantGroupTypeID,
        pg.Description

    FROM ParticipantGroup AS pg

    INNER JOIN InitialParticipant AS ip
            ON pg.ParticipantGroupID = ip.ParticipantGroupID

) AS ij1

INNER JOIN (

    SELECT
        tpa.participantID AS tpaParticipantID,
        ij3.*

    FROM tmpPartArgs AS tpa

    INNER JOIN (

        SELECT
            ij4.*,
            occp.SimulationID AS occpSimulationID,
            occp.SimRunID AS occpSimRunID,
            occp.OccurrenceID AS occpOccurrenceID,
            occp.ParticipantID AS occpParticipantID,
            occp.RoleTypeID

        FROM (

            SELECT
                tsa.SimulationID AS tsaSimulationID,
                tsa.SimRunID AS tsaSimRunID,
                occ.SimulationID AS occSimulationID,
                occ.SimRunID AS occSimRunID,
                occ.OccurrenceID AS occOccurrenceID,
                occ.OccurrenceTypeID,
                occ.Period,
                occ.HasSucceeded

            FROM tmpSimArgs AS tsa

            INNER JOIN Occurrence AS occ
                    ON (tsa.SimRunID = occ.SimRunID)
                   AND (tsa.SimulationID = occ.SimulationID)

        ) AS ij4

        INNER JOIN OccurrenceParticipant AS occp
                ON (occOccurrenceID = occpOccurrenceID)
               AND (occSimRunID = occpSimRunID)
               AND (occSimulationID = occpSimulationID)

    ) AS ij3
      ON tpa.participantID = ij3.occpParticipantID

) AS ij2
  ON ij1.ipParticipantID = ij2.occpParticipantID

WHERE (

    (

        (ij2.RoleTypeID) = 52
        OR
        (ij2.RoleTypeID) = 49

    )

)
  AND ij2.HasSucceeded = 1

GROUP BY
    ij1.Description,
    ij2.occSimulationID,
    ij2.occSimRunID,
    ij2.Period;

As per JohnFx (above), I was confused by the derived views. I think there is actually no need for it, especially since they are all inner joins. So, below I have attempted to reduce the complexity. Please review and test for performance. I have had to do a cross join with tmpSimArgs since it is only joined to Occurence - I assume this is desired behaviour.

SELECT
    pg.Description,
    occ.SimulationID,
    occ.SimRunID,
    occ.Period,
    COUNT(occp.ParticipantID) AS CountOfParticipantID

FROM ParticipantGroup AS pg

INNER JOIN InitialParticipant AS ip
        ON pg.ParticipantGroupID = ip.ParticipantGroupID

CROSS JOIN tmpSimArgs AS tsa

INNER JOIN Occurrence AS occ
        ON tsa.SimRunID = occ.SimRunID
       AND tsa.SimulationID = occ.SimulationID

INNER JOIN OccurrenceParticipant AS occp
        ON occ.OccurrenceID = occp.OccurrenceID
       AND occ.SimRunID = occp.SimRunID
       AND occ.SimulationID = occp.SimulationID

INNER JOIN tmpPartArgs AS tpa
        ON tpa.participantID = occp.ParticipantID

WHERE occ.HasSucceeded = 1
  AND (occp.RoleTypeID = 52 OR occp.RoleTypeID = 49 )

GROUP BY
    pg.Description,
    occ.SimulationID,
    occ.SimRunID,
    occ.Period;
Ben Laan
A: 

I would suggest moving the ij2.RoleTypeID filtering from the outermost query to ij3, use IN instead of OR, and move the HasSucceeded query to ij4.

Sam