tags:

views:

397

answers:

3

Any ideas for optimizing the following query using Sqlite3?

SELECT * FROM Feed 
    WHERE ActivityType IN ('PhotoActivity','CommentActivity') 
    AND UserKey NOT IN ('testUser', 'testUser2') 
    ORDER BY TimeStamp DESC 
    LIMIT 20 OFFSET 0;

The table will never have over 100,000 records and we expect 100 to 1 reads to writes.

Any help is greatly appreciated.

Table Sql is:

CREATE TABLE Feed (
        FeedActivityKey TEXT PRIMARY KEY,           
        UserKey TEXT,
        AssemblyQualifiedName TEXT,
        SerializedObject BLOB,
        ActivityType TEXT,
        CorrelatedKey TEXT,
        TimeStamp INTEGER);
    CREATE INDEX Feed_ActivityTypeUserKey ON [FriendFeed] (
    [ActivityType], [UserKey] DESC);
    CREATE INDEX Feed_UserKey ON [FriendFeed] (
    [UserKey] DESC);
    CREATE INDEX Feed_TimeStamp ON [FriendFeed] (
    [TimeStamp] DESC);

Explain Output is:

0 Trace 0 0 0 0

1 OpenEphemeral 1 3 0 keyinfo(1,-BINARY) 0

2 Integer 20 1 0 0

3 MustBeInt 1 0 0 0

4 IfZero 1 73 0 0

5 Integer 0 2 0 0

6 MustBeInt 2 0 0 0

7 IfPos 2 9 0 0

8 Integer 0 2 0 0

9 Add 1 2 3 0

10 IfPos 1 12 0 0

11 Integer -1 3 0 0

12 String8 0 4 0 PhotoActivity 0

13 String8 0 5 0 CommentActivity 0

14 Goto 0 74 0 0

15 OpenRead 0 2 0 7 0

16 OpenRead 2 4 0 keyinfo(2,BINARY,BINARY) 0

17 If 7 25 0 0

18 Integer 1 7 0 0

19 OpenEphemeral 4 1 0 keyinfo(1,BINARY) 0

20 Null 0 9 0 0

21 MakeRecord 4 1 9 a 0

22 IdxInsert 4 9 0 0

23 MakeRecord 5 1 9 a 0

24 IdxInsert 4 9 0 0

25 Rewind 4 53 0 0

26 Column 4 0 6 0

27 IsNull 6 52 0 0

28 Affinity 6 1 0 aab 0

29 SeekGe 2 52 6 1 0

30 IdxGE 2 52 6 1 1

31 IdxRowid 2 9 0 0

32 Seek 0 9 0 0

33 Column 0 0 10 0

34 Column 2 1 11 0

35 Column 0 2 12 0

36 Column 0 3 13 0

37 Column 2 0 14 0

38 Column 0 5 15 0

39 Column 0 6 16 0

40 MakeRecord 10 7 9 0

41 Column 0 6 17 0

42 Sequence 1 18 0 0

43 Move 9 19 1 0

44 MakeRecord 17 3 8 0

45 IdxInsert 1 8 0 0

46 IfZero 3 49 0 0

47 AddImm 3 -1 0 0

48 Goto 0 51 0 0

49 Last 1 0 0 0

50 Delete 1 0 0 0

51 Next 2 30 0 0

52 Next 4 26 0 0

53 Close 0 0 0 0

54 Close 2 0 0 0

55 OpenPseudo 5 1 7 0

56 Sort 1 72 0 0

57 AddImm 2 -1 0 0

58 IfNeg 2 60 0 0

59 Goto 0 71 0 0

60 Column 1 2 9 0

61 Integer 1 8 0 0

62 Insert 5 9 8 0

63 Column 5 0 10 0

64 Column 5 1 11 0

65 Column 5 2 12 0

66 Column 5 3 13 0

67 Column 5 4 14 0

68 Column 5 5 15 0

69 Column 5 6 16 0

70 ResultRow 10 7 0 0

71 Next 1 57 0 0

72 Close 5 0 0 0

73 Halt 0 0 0 0

74 Transaction 0 0 0 0

75 VerifyCookie 0 5 0 0

76 TableLock 0 2 0 FriendFeed 0

77 Goto 0 15 0 0

A: 

I would remove the select * from and change it to select [column1], [column2] from

The reason for this is so that you ONLY return the values you need (IE: i see your table has 6 columns, but you're only returning 5 columns in your results.), and it reduces overhead since your not using a wildcard.

rockinthesixstring
A: 

Maybe OR will be faster than IN?


EDIT Check this: http://www.sqlite.org/optoverview.html#or_opt

skyman
No that is not. the `IN` syntax is faster. (one parse only)
Pentium10
I need the AND syntax anyways.. I read the Optimization notes and setup my index to them... I am not familiar with the op codes and was hoping someone that knows them could help.
Alex Spence
+1  A: 
  • Add columns to your select
  • Normalize more the table. For example you can define ActivityType,UserKey in a different table having a numeric primary key.
  • since you have 100 to 1 read to writes, issue a SHARED lock before select (the write process can wait a bit longer if has to)
Pentium10
I can probably normalize ActivityType but UserKey is impossible due to the structure of our app. Maybe a hash code for user key would be faster instead of the current implementation? Most of the UserKeys are GUIDS.
Alex Spence
how long takes the query as it is? do you issue in desktop or mobile environment?
Pentium10
Desktop Environment. With 100,000 rows the query takes 1019 ms
Alex Spence
Ok. I think your issue here is the amount of data that needs to be transfered from disk to memory. Try limiting the select * to something more precise. In case if by using limited return columns satisfies you, and you need all the columns in your query, try saving data heavy columns in gzip format to reduce the data from disk to memory.
Pentium10
The table has 100,000 records, but im only taking the first 20 that match.
Alex Spence