I have a table with about 1.4 billion records, with the following format:
mysql> describe _2009all; +---------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------------+--------------+------+-----+---------+-------+ | userId | int(11) | YES | MUL | NULL | | | type | varchar(50) | YES | | NULL | | | kind | varchar(50) | YES | | NULL | | | description | varchar(255) | YES | | NULL | | | bundleVersion | varchar(255) | YES | | NULL | | | bundleId | varchar(255) | YES | | NULL | | | time | bigint(20) | YES | | NULL | | +---------------+--------------+------+-----+---------+-------+ 7 rows in set (0.02 sec)
The entire database takes up less than 0.4 terabytes, and I have about 1.5 terabytes of free disk space available.
I'm trying to prepare the data for analysis where I look at what each user does over time. So I run the following statement:
create table sorted2009 AS (select * from _2009all order by userid,time);
The statement (obviously) takes a long time to run, but at some point, it exhausts the all the free disk space and I get a "ERROR 3 (HY000): Error writing file" error.
Any ideas on how I can create my sorted table? Thanks in advance.
Clarification
Martin: There is only 1 partition.
Andy: The problem I'm trying to solve is that the results of "select * from _2009all order by userid,time" are something that I'm going to be inspecting a lot over the next few months. Thus, rather than running this query every time I want to inspect the results, I want to put the results in a table for faster access.
I had initially assumed that indexing over time would be a waste of space, since timestamps are mostly unique. But I just added an index to the time field in _2009all, and the index takes up a reasonable amount of space. But now to do the sort over the two indexes (what I assume you mean by "indexed walk"):
mysql> select * from _2009all order by userId,time limit 2; ... 2 rows in set (25 min 36.48 sec)
Yes, I'd say 25 minutes is too long. But if you meant something else by your suggestion, I'm open to alternatives.
bot43:
After adding the index to the time field:
mysql> explain select * from _2009all order by userid,time; +----+-------------+----------+------+---------------+------+---------+------+------------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+------+------------+----------------+ | 1 | SIMPLE | _2009all | ALL | NULL | NULL | NULL | NULL | 1384378798 | Using filesort | +----+-------------+----------+------+---------------+------+---------+------+------------+----------------+ 1 row in set (0.05 sec) mysql> explain select userId,type,kind,description,bundleVersion,bundleId,time from _2009all order by userid,time +----+-------------+----------+------+---------------+------+---------+------+------------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+------+------------+----------------+ | 1 | SIMPLE | _2009all | ALL | NULL | NULL | NULL | NULL | 1384378798 | Using filesort | +----+-------------+----------+------+---------------+------+---------+------+------------+----------------+ 1 row in set (0.00 sec)
Good point about the DBMS storing the table any damn way it feels like it. I guess I need another solution.