I am writing a set of functions to generate a sitemap for a website. Lets assume that the website is a blog.
The definition of a sitemap is that it lists the pages that are available in a website. For a dynamic website, those pages change quite regularly.
Using the example of a blog, the 'pages' will be the blog posts (I'm guessing), since there is a finite limit on the number of links in a sitemap (ignore sitemap indexes for now), it means I cant keep adding a list of the latest blog posts, because at some point in the future, the limit will be exceeded.
I have made two (quite fundamental) assumptions in the above paragraph. They are:
Assumption 1:
A sitemap contains a list of pages in a website. For a dynamic website like a blog, the pages will be the blog posts. therefore, I can create a sitemap that simply lists the blogposts on the website. (This sounds like a feed to me)
Assumption 2:
since there is a hard limit on the number of links in the sitemap file, I can impose some arbitary limit N, and simply generate the file periodically, to list the latest N blogposts (at this stage, this is indistinguishable from a feed)
My questions then are:
- Are the assumptions (i.e. my understanding of what goes inside a sitemap file) valid/correct?
- What I described above, sounds very much like a feed, can bots not simply use a feed to index a web site (i.e. is a sitemap necessary)?
- If I am already generating a file that has the latest changes in it, I don't see the point of adding in the sitemap protocol file - can someone explain this?