views:

162

answers:

3

The automatic sitemap for my Django site fluctuates between including the www on urls and leaving it out (I'm aiming to have it in all the time). This has ramifications in google not indexing my pages properly so I'm trying to narrow down what would be causing this issue.

I have set PREPEND_WWW = True and my site record in the sites framework is set to include the www e.g. it's set to www.example.com as opposed to example.com. I'm using memcached but pages should expire from the cache after 48 hours so I wouldn't have thought that would be causing the issue?

You can see the problem in effect at http://www.livingspaceltd.co.uk/sitemap.xml (refresh the page a few times).

My sitemaps setup is fairly prosaic so I'm doubtful that that is the issue, but in case it's something obvious I'm missing here's the code:

***urls.py***

sitemaps = {
    'subpages': Subpages_Sitemap,
    'standalone_pages': Standalone_Sitemap,
    'categories': Categories_Sitemap,
}

urlpatterns = patterns('',
    (r'^sitemap\.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}),
    ...

***sitemaps.py***

# -*- coding: utf-8 -*- 
from django_ls.livingspace.models import Page, Category, Standalone_Page, Subpage
from django.contrib.sitemaps import Sitemap

class Subpages_Sitemap(Sitemap):
    changefreq = "monthly"
    priority = 0.4
    def items(self):
        return Subpage.objects.filter(restricted_to__isnull=True)

class Standalone_Sitemap(Sitemap):
    changefreq = "weekly"
    priority = 1
    def items(self):
        return Standalone_Page.objects.all()

class Categories_Sitemap(Sitemap):
    changefreq = "weekly"
    priority = 0.7
    def items(self):
        return Category.objects.all()
A: 

It might be one of the weirdest problem I've seen. But the thing is the way Django constructs URLs in sitemap is extremely straightforward. It just gets curent Site object from the database and appends value of the "domain" field to page's relative location:

current_site = Site.objects.get_current()
...
loc = "http://%s%s" % (current_site.domain, self.__get('location', item))

(source)

Are you sure you are not doing anything weird on a database level? If you had multiple mirrored databases, but they weren't consistant it could produce a similar effect. Try setting up a test view that just displays Site.objects.get_current(). It will probably fluctuate as well.

If you use any third-party caching app (like Johnny Cache) try turning it off.

Also, make sure you don't have two Site objects - one with, and one without www (it shouldn't give you a similar effect, but with multiple server instances, configured for different SITE_ID's... maybe?)

Ludwik Trammer
Thanks for you response Ludwik... I'm not doing anything fancy with multiple dbs, and I only have one site object. I've tried using `Site.objects.get_current()` using the shell and it consistently returns it with the www as it should. I'll try setting up a test view as you suggest to see if that differs, and turning off caching completely also, when I'm back in the office after the weekend.
Jen Z
OK, I've tried turning off caching completely, to no avail. I've also set up a test page which just outputs `Site.objects.get_current()` at http://www.livingspaceltd.co.uk/url-test/ - the www is constantly present as it should be. I'm editing my orginal question to include the sitemaps code in case it is something in there.
Jen Z
Thanks for your help Ludwik, it helped me narrow down the problem.
Jen Z
A: 

Well, it look like it was a caching error after all - I'm not quite sure wht was wrong, as I had made the changes over a week ago, so it defintely wasn't behaving right and I had to try a couple of diffrent methods to restart it. So that bears some deeper investigation, but it's working now.

Jen Z
A: 

PREPEND_WWW = True in settings.py must appear above your caching variable settings. This fixed my problem which is just the same with yours. I ran into this same problem when i submit my sitemap in google webmaster tool.

ronbeltran
Hmm, interesting. To be honest the problems I was having resolved themselves when I manually restarted the caching. I would have been interested to try this at the same time I was having issues to see if it's a better fix. Regardless, I've made the changes you suggested to prevent any further issues. Thanks!
Jen Z