tags:

views:

36

answers:

1

I have written an autogenerating sitemap code below, it is called every 12 hours. It looks through the mysql database, and then generates the input into the xml file for every row it finds in the mysql table.

I am not the best PHP coder, so I hope you guys can help me improve this below.

It works and does the job, but having over 400thousand ads can make it pretty slow.

 function xml_sitemap() {
    $website_url="/SV/Sitemap/";
    $ads_query_results = mysql_query("SELECT * FROM cars_db ORDER BY insert_date DESC") or die(mysql_error());
    $nr_of_sitemaps = count(glob('../../Sitemap/*')); 
    //$nr_files=1;
    $row_data_seek=1;

    for ($i=1; $i<$nr_of_sitemaps; $i++){
     if ($nr_of_sitemaps!='.' || $nr_of_sitemaps!='..'){
      $nr_files++;
     }
    }
    for ($i=1; $i<=$nr_of_sitemaps; $i++){
     unlink('../../Sitemap/Sitemap'.$i.'.xml');
    }
    $number_of_ads = mysql_num_rows($ads_query_results);
    $files_needed = ($number_of_ads/49500);
    $files_needed = ceil($files_needed);

for ($i=1; $i<=$files_needed; $i++){
    $xml_file = fopen ('../../Sitemap/Sitemap'.$i.'.xml','w');
    fwrite ($xml_file,"<?xml version='1.0' encoding='UTF-8'?>\n");
    fwrite ($xml_file,"<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'&gt;\n");

     for ($z=0; $z<($number_of_ads/$files_needed); $z++){
     mysql_data_seek($ads_query_results, $z);
     $row_results=mysql_fetch_array($ads_query_results);

     $current_ad_id= $website_url.$row_results['ad_category'].'/';
     $current_ad_id = $current_ad_id.$row_results['ad_id'].'.htm';
     $last_modified = date('Y-m-d', strtotime($row_results['insert_date']));
     $change_frequency = 'weekly';

      fwrite ($xml_file, "<url>\n");
       fwrite ($xml_file, "<loc>");
       fwrite ($xml_file, $current_ad_id);
       fwrite ($xml_file, "</loc>\n");
       fwrite ($xml_file, "<lastmod>");
       fwrite ($xml_file, $last_modified);
       fwrite ($xml_file, "</lastmod>\n");
       fwrite ($xml_file, "<changefreq>");
       fwrite ($xml_file, $change_frequency);
       fwrite ($xml_file, "</changefreq>\n");
       fwrite ($xml_file, "<priority>");
       fwrite ($xml_file, 0.8);
       fwrite ($xml_file, "</priority>\n");
      fwrite ($xml_file, "</url>\n");
     } //end for

    fwrite ($xml_file, "</urlset>"); 
    fclose ($xml_file);
}//end main for
}
+2  A: 

From what I can tell, your code is fairly optimized. What kind of build times are you seeing? Minutes? Hours?

There are a few changes you can make outright:

Change this code:

for ($z=0; $z<($number_of_ads/$files_needed); $z++){

to

$count = ($number_of_ads/$files_needed);
for ($z=0; $z<$count; $z++){

Every iteration of your loop will run the termination expression. If you have a lot of records (like 400,000), you'll run into significant slowdowns.

Combine your fwrite statements:

fwrite ($xml_file, "<url>\n");
    fwrite ($xml_file, "<loc>");
    fwrite ($xml_file, $current_ad_id);
    fwrite ($xml_file, "</loc>\n");
    fwrite ($xml_file, "<lastmod>");
    fwrite ($xml_file, $last_modified);
    fwrite ($xml_file, "</lastmod>\n");
    fwrite ($xml_file, "<changefreq>");
    fwrite ($xml_file, $change_frequency);
    fwrite ($xml_file, "</changefreq>\n");
    fwrite ($xml_file, "<priority>");
    fwrite ($xml_file, 0.8);
    fwrite ($xml_file, "</priority>\n");
fwrite ($xml_file, "</url>\n");

could become:

fwrite($xml_file, "<url><loc>$current_ad_id</loc><lastmod>$last_modified</lastmod><changefreq>$change_frequency</changefreq><priority>0.8</priority></url>");

Sure, it's messier, but one fwrite is better than 10+.

Also, try to avoid using strtotime. It's a slow, dirty function. If you can, try to use some simple logic to get around using it.

Hope this helps!

mattbasta
Good tips.It may also help to store everything in a variable and write the variable to the file once, or at least every x times (experiment and see what number gets best performance without taking too much memory) through the loop. Having 400,000 writes to disk will definitely slow it down.
bradym
I was thinking about that as well bradym, but I figured if it's running as a CRON operation and there are 400,000 records, the server can spare 400,000 disk operations (really 800,000 if you count MySQL hits). PHP has a nasty habit of tossing out and regenerating strings when you concatenate, so running very large concatenations could blast the CPU.
mattbasta