tags:

views:

25

answers:

2

My sitemap is massive and I have to break it into a sitemap index which i have done and the sitemaps listed are sitmap-1.xml sitemap-2.xml etc.

I want to have one sitemap file which is based on the sitemap number chosen ie. sitemap-1.xml will go to the sitemap file and list 1 - 30,000 links and if sitemap-2 is chosen will show 30,000 - 60,000 etc...

How do I check which sitemap number was selected,and do i name the file?

A: 

Why not name them as you did "sitemap-1" and "sitemap-2" and within you do something like that in a sitemap.php file:

$number = $_GET['number'];
$entries_per_page = 30000;
for($i = ($number - 1) * $entries_per_page; $i < $number * $entries_per_page; $i++){
  // print out the values
}

In addition to that add an entry in your .htaccess like this:

RewriteRule sitemap-([0-9]+).xml$ sitemap.php?number=$1

You can also use the bounries for your SQL query:

$query = "SELECT * FROM table LIMIT ".(($number - 1) * $entries_per_page).", ".$entries_per_page;

I use something similar and in addition to that a sitemap_index.php file looking like this:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"&gt;
<? for($i = 1; $i <= 6; $i++) : ?>
 <sitemap>
  <loc><?php echo ROOT_PATH?>sitemap_<?= $i ?>.xml</loc>
  <lastmod><?php echo date('Y-m-d')?></lastmod>
 </sitemap>
<? endfor ?>
</sitemapindex>

And as most search engines do like sitemaps with a maximum of 1000 entries, you might split your sitemap in even smaller parts. And just use ceil($total_entries/$entries_per_page) for the for loop (replacing the "6").

Kau-Boy
I have millions of pages so I will have to create hundreds of sitemap-1 sitemap-2 sitemap-3 files
Dasa
For this code: $query = "SELECT * FROM table LIMIT ".(($number - 1) * $entries_per_page).", ".$entries_per_page;Can you say "SQL Injection"... Try this: http://you-are-dead.com/this-script.php?number=10;DROP TABLE table;The database gets something along the lines of: SELECT * FROM table LIMIT 10; DROP TABLE table;
Robin
I am just saying that some search engines prefer sitemaps with 1000 entires. Google can handle sitemaps up to 50,000 URLs and 10M of size, so you won't need to many of them. But even having hundreds of sitemaps you need only one sitemap-index file.
Kau-Boy
@Robin: I just pointed out how you can use it. Adding real_escape_string() or similar functions always make queries hard to read. But as at any SO question there is someone coming with the "but there is a sql injection" it starts to disturb everybody.
Kau-Boy
A: 

Sounds like you need a primitive linking index document? Something like this:

<site-idx>
  <sub href="sitemap-1.xml"/>
  <sub href="sitemap-2.xml"/>
  <sub href="sitemap-3.xml"/>
</site-idx>

You can then write a program thats takes a number as an input, makes assumptions about how many entries there are per file and then loads the correct index? Or maybe you could specify the number of entries in each sub-file in the index file:

<site-idx>
  <sub href="sitemap-1.xml" num="1000"/>
  <sub href="sitemap-2.xml" num="1010"/>
  <sub href="sitemap-3.xml" num="1101"/>
</site-idx>

You might be able to leverage XInclude to load the sub documents, for large XML files I'd suggest using the XMLReader in PHP, it'll consume lots less memory.

Robin
Haven't you ever heard of the sitemap standard? Why inventing a new XML structure when there is an exisiting one for sitemaps and sitemap indexes that each search engine will be easily ableto parse?
Kau-Boy