I'm working at a complex script which could be processing upto 500,000 records. Here's my question.
Basically my code will parse a text file to get each of those 500,000 or so records. Each record will have a category, my code will need to check if a new record in the categories
table had been created for that category during that particular processing, and if not, it will create that record.
So I have 2 options:
1) I store an array of keys=>values containing category name and ID, so I could do this:
if (array_key_exists($category,$allCategories))
$id=$allCategories[$category];
else
{
mysql_query("INSERT INTO categories (procId,category)
VALUES ('$procId''$category')");
$id=mysql_insert_id();
$allCategories[$category]=$id;
}
2) Each time this text file is processed, it will get its own process ID. So rather than checking the $allCategories
variable which could grow to have 100,000+ entries, I could do this:
SELECT id FROM categories WHERE procId='$procId' AND category='$category'
The downside here is that this query will be run for each of the 500,000+ records. Whereas the disadvantage of holding all the categories in an array is that I could run out of memory or the server could crash.
Any thoughts?