tags:

views:

74

answers:

2

I have a table with about 1.4 billion records, with the following format:

mysql> describe _2009all;
+---------------+--------------+------+-----+---------+-------+
| Field         | Type         | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+-------+
| userId        | int(11)      | YES  | MUL | NULL    |       |
| type          | varchar(50)  | YES  |     | NULL    |       |
| kind          | varchar(50)  | YES  |     | NULL    |       |
| description   | varchar(255) | YES  |     | NULL    |       |
| bundleVersion | varchar(255) | YES  |     | NULL    |       |
| bundleId      | varchar(255) | YES  |     | NULL    |       |
| time          | bigint(20)   | YES  |     | NULL    |       |
+---------------+--------------+------+-----+---------+-------+
7 rows in set (0.02 sec)

The entire database takes up less than 0.4 terabytes, and I have about 1.5 terabytes of free disk space available.

I'm trying to prepare the data for analysis where I look at what each user does over time. So I run the following statement:

create table sorted2009 AS (select * from _2009all order by userid,time);

The statement (obviously) takes a long time to run, but at some point, it exhausts the all the free disk space and I get a "ERROR 3 (HY000): Error writing file" error.

Any ideas on how I can create my sorted table? Thanks in advance.

Clarification

Martin: There is only 1 partition.

Andy: The problem I'm trying to solve is that the results of "select * from _2009all order by userid,time" are something that I'm going to be inspecting a lot over the next few months. Thus, rather than running this query every time I want to inspect the results, I want to put the results in a table for faster access.

I had initially assumed that indexing over time would be a waste of space, since timestamps are mostly unique. But I just added an index to the time field in _2009all, and the index takes up a reasonable amount of space. But now to do the sort over the two indexes (what I assume you mean by "indexed walk"):

mysql> select * from _2009all order by userId,time limit 2;
...
2 rows in set (25 min 36.48 sec)

Yes, I'd say 25 minutes is too long. But if you meant something else by your suggestion, I'm open to alternatives.

bot43:

After adding the index to the time field:

mysql> explain select * from _2009all order by userid,time;
+----+-------------+----------+------+---------------+------+---------+------+------------+----------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows       | Extra          |
+----+-------------+----------+------+---------------+------+---------+------+------------+----------------+
|  1 | SIMPLE      | _2009all | ALL  | NULL          | NULL | NULL    | NULL | 1384378798 | Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+------------+----------------+
1 row in set (0.05 sec)

mysql> explain select userId,type,kind,description,bundleVersion,bundleId,time from _2009all order by userid,time
+----+-------------+----------+------+---------------+------+---------+------+------------+----------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows       | Extra          |
+----+-------------+----------+------+---------------+------+---------+------+------------+----------------+
|  1 | SIMPLE      | _2009all | ALL  | NULL          | NULL | NULL    | NULL | 1384378798 | Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+------------+----------------+
1 row in set (0.00 sec)

Good point about the DBMS storing the table any damn way it feels like it. I guess I need another solution.

A: 

I don't think you could use some trick to make this operation take less space. If that is what you are asking. The short and simple answer is: Get more space by adding another hard drive. With that kind of DB you are goind to need it anyway.

You could also try splitting the table in two or more parts and sorting and viewing each table at a time. If that is working for you. GL!

avok00
The solution is not to get more disk, but to properly use existing disk... indexing is the solution.
r00fus
I tried to answer the original question. The sorted table cannot be created to take less space. What he needs sorted table for is another problem. So, don't tell me what is the solution. This is not a solution, but an alternative appoach. For all we know, he might not need all the data too :)
avok00
+1  A: 

What indexes do you have on the table?

Can't you put a composite index on the two columms?

ALTER TABLE `_2009all` ADD INDEX ( `userId` , `time` ) ;
r00fus
That'll do. Thanks to bot403 and Andy Lester for the assist.
emerson