views:

37

answers:

3

Hi guys,

I have about 50.000 of records to import in a Magento store. What I have already tested: The file is about 50 MB.

  • Splitted files
  • API
  • Magento Classes

Splitting the file doesn't improve the speed of the importing of the products. Api are very slow. Magento Classes are slow.

This is a snipped of code using the Magento Classes:

// Build the product
$product->setIsMassupdate(true)
        ->setExcludeUrlRewrite(true)
        ->setManufacturer($this->addManufacturers(utf8_encode($record[4])))
        ->setSku($record[3])
        ->setAttributeSetId($this->attribute_set)# 9 is for default
        ->setTypeId(Mage_Catalog_Model_Product_Type::TYPE_SIMPLE)
        ->setName(utf8_encode($record[5]))
        ->setCategoryIds($this->getCategories(array($record[0], $record[1], $record[2]))) # some cat id's,
        ->setWebsiteIDs(array(1)) # Website id, 1 is default
        ->setDescription(utf8_encode($record[6]))
        ->setShortDescription($this->shortText(utf8_encode($record[6]), 150))
        ->setPrice($price) # Set some price
        ->setSpecialPrice($special_price)
        ->setWeight($record[12])
        ->setStatus( Mage_Catalog_Model_Product_Status::STATUS_ENABLED )
        ->setVisibility(Mage_Catalog_Model_Product_Visibility::VISIBILITY_BOTH)
        ->setTaxClassId(2)     // default tax class
        ->setPixmaniaimg($record[10])
        ->setStockData(array('is_in_stock' => $inStock, 'qty' => $qty))
        ->setCreatedAt(strtotime('now'));

$product->save();     
$ID = is_numeric($productID) ? $productID : $product->getId(); 

So the above method is correct but it spends about 5 hours in order to insert only 2300 records!!

Which are the simple SQL inserts that I have to execute in the Magento DB in order to add a new product?

+1  A: 

It's very hard to create products using raw SQL queries, because Magento uses EAV pattern for storing products.

WebFlakeStudio
Yes it is but I'm a developer ;)
Michelangelo
So, use debugger and investigate SQL queries for adding products :)
WebFlakeStudio
Ok Done! I have inserted 50000 records in 4 hours! Now the problem is the re-indexing the Magento Catalog URL Rewrites!!
Michelangelo
My congratulations!
WebFlakeStudio
+1  A: 

I strongly recommend that you avoid writing raw SQL at all costs, you will almost certainly spend days and days writing to map the attribute IDs and probably get it wrong. It will also bypass all the important indexing and other system updates that Magento relies on.

If speed is your issue, I suggest that you consider uRapidFlow from Unirgy. Usual disclaimers apply, I have no affiliation with Unirgy, but my observations has been that the quality of this work is excellent.

HTH, JD

Jonathan Day
I agree with you in spirit, but there is a scale (many hundreds of thousands of records) where I have not found a single in-framework solution for importing records. A small wrapper to load the IDs at the beginning of runtime can mitigate the risks you brought up and make manual import a viable solution.
Joseph Mastey
I suspect those all important indexes are the biggest slowdown at these sorts of scales. Manually writing SQL in this case might be beneficial, the indexes can be rebuilt afterwards.
clockworkgeek
@Joseph, good idea to load the inserted products and "inform" Mage of their existence. @clockworkgeek - no doubt you're right, it is the indexes, but also potentially some foreign keys relationships too. Anyone who can solve this problem has the potential to make a lot of money (as I suspect Unirgy has discovered!).
Jonathan Day
Indexes are a problem, but I don't even think that they are the greatest problem. While reloading an index 50k times is clearly inefficient, loading a new object that many times (and, I suspect, loading some table metadata that many times) is completely crippling. I know that Magento has some memory leaks to compound the problem and put a cap on each page load to boot, so loading through the framework is a bit of a mess.
Joseph Mastey
@Joseph - agree with your points, refer my comments on profiling below...
Jonathan Day
Hi guys, I have tested the import using the various ways and I have choosed to use a direct connection that it will be more fast than the others. Anyway I have to update it everytime when Varien change something in the core tables but the main important thing is the SPEED. The software mentioned above use a direct connection with the db! ;)
Michelangelo
+1  A: 

Occasionally I've noticed bulk inserts that work by first creating a template model...

$blankProduct = Mage::getModel('catalog/product');

...then avoid the creation of the model for each record...

$newProduct = clone $blankProduct;
$newProduct->setIsMassupdate(true)
    ...
$newProduct->save();

It's slightly more efficient but probably not enough to get that massive import to a reasonable time.

clockworkgeek
It would be interesting to use the profiler on these imports. It's actually quite simple to hook into Magento's profiler, I found it helpful to track down the bottleneck on a nav renderer. My suspicion is that it is the `save()` that is the expensive step, rather than the `getModel()`, but profiling would prove it. It would be nice if you could create all your objects in a collection then commit the collection in one step, rather than each product individually... hmmm.
Jonathan Day
It's very slow!
Michelangelo