ansaurus

Question

Answer 1

+2 A:

Can you just keep a list of the ids that you've already inserted? If they are integer ids, that's 4 bytes each times 100,000 entries would use only about 400K of memory.

ETA:

To avoid storing the category name, hash the name and store the hash. With a 128 bit MD5 hash, that's 16 bytes per hash or only about 1.6MB of memory + overhead.

Eric Petroelje 2009-05-15 16:00:51

But i'd also need to store the category names, which are in a string such as "Category A", "Category B", "Long name category C", etc. So it could add up. But if its a better option than doing a query each time, i might go with it

Click Upvote 2009-05-15 16:03:37

PHP takes about 68 bytes to store an integer. It isn't C. It stores everything in a C struct called a ZVal and... well you can't take sizeof(int) from C and assume it translates to PHP.

James Socol 2009-05-15 16:26:04

@James - yeesh, I knew PHP would have extra overhead, but I had no idea it was that bad :)

Eric Petroelje 2009-05-15 17:09:59

Eric, which hash system do you recommend? Md5?

Click Upvote 2009-05-15 17:11:56

Answer 2

+1 A:

One idea will be to add a constraint on table so duplicate inserts are rejected by database. Then just keep inserting all records and let the db do the checking.

Tahir Akhtar 2009-05-15 16:03:00

Good idea but each of the records need to be linked to the appropriate category ID in the database. Also, there might be more than 1 record with the same category name but a different userId or procId (process id)

Click Upvote 2009-05-15 16:06:01

You can define a composite unique key on (category,userid,procid) so database will only reject inserts when exact same combination of these column values is already present in the table.

Tahir Akhtar 2009-05-15 16:41:21

Does that work in mysql?

Click Upvote 2009-05-15 17:16:18

CREATE TABLE `categories` ( `category_id` int(11) NOT NULL default '0', `user_id` int(11) NOT NULL default '0', `proc_id` int(11) NOT NULL default '0', UNIQUE KEY `UniqueCategoryUserProc` (`category_id`,`user_id`,`proc_id`) ) TYPE=InnoDB

Tahir Akhtar 2009-05-16 08:32:15

Answer 3

+1 A:

Given that your average category name is 30 bytes, you'd only need 30 * 500000 bytes = 15000000 bytes = 15000 kilobytes = 1.5 megabytes.

I think you have this much memory.

Georg 2009-05-15 16:24:52

ansaurus

tags:

views:

answers:

Memory / Optimization concern

related questions