tags:

views:

73

answers:

2

I have a database in SQLite Administrator, with 3 tables, say A,B and C.

Table A has 3 columns p1,p2 and p3, with about 2 million rows.
Table B has 2 columns p1 and p4, with also about 2 million rows.
Table C has 1 column p4 with about 800,000 rows.

The query that I am trying to run is as following:

SELECT A.p1, B.p4, A.p2, A.p3
FROM A,B,C
WHERE A.p1=B.p1 AND B.p4=C.p4

The query already took 3 days, and still didn't finish. I wonder if I should abort it or wait till it completes. If it will finish in next 5-6 days I will probably wait, but if it takes more than that, I will have to abort it.

Should I wait or not?

My PC specs are: Core 2 duo 1.86GHz, 2 GB RAM,

A: 

Do you have indexes on A.p1, B.p1, B.p4, C.p4 ? If not, then you'd better stop it, it might run for several years.

For this kind of operations you need something bigger. This is not Lite at all. Think about switching to another RDBMS.

Alexander
I am not sure about indexes. I didn't create them myself, but maybe sqlite created them automatically? I created the tables by importing data from text files.
Sunny88
Certainly no indexes were created. No indexes are created automatically. Primary keys, foreign keys, unique keys, indexes... all have to be defined manually.
Alexander
Do you think that mySQL will do it much faster than SQLite?
Sunny88
Try to create indexes in SQLite first
Unreason
This is a job for FirebirdSQL Embedded Server. Though it's still slower than FirebirdSQL Super Server, which is slower than PostgreSQL, Oracle and Microsoft SQL Server, but still faster than MySQL.
Alexander
@Alexander, your comparison between engines can only be misguiding.
Unreason
Also SQLite is light when it comes to any sort of multiuser write access, otherwise, while it is mostly single user, it is a reasonable engine, even for workloads such as this.
Unreason
+2  A: 

I would say there's nothing strange in 3 days (if no indexes).

If no indexes on A, B, C then your query would make a full scan of A x B x C. The number of records in A x B x C is

SELECT COUNT(*)
FROM A,B,C

which is (2*10^6) * (2*10^6) * (0.8*10^6) = 3.2 * 10^18

Assuming that you can apply the where condition to billion records in a second you would still need 3.2 * 10^9 seconds. Which is just over 101 years.

However, if you have indexes on p1 and p4 decent RDBMS would be able to access results directly and not scan the full Cartesian product (well, I think that some DBs would choose to build temporary indexes, which would still be slow, but would make the query actually execute).

Unreason
I see!I guess I will abort the query and make indexes first then.
Sunny88
I made indexes and run it again... It finished in 49 seconds.Thanks, StackOverflow )
Sunny88