views:

83

answers:

1

From Googling around it looks like an old Google bug has come back....

http://groups.google.com/group/Google%5FWebmaster%5FHelp-Tools/browse%5Fthread/thread/4e43c2efecb881cf?pli=1

My sitemap index file and sitemap itself validates here: http://www.validome.org/google/validate

but Google Webmaster Tools says:

Missing XML tag This required tag is missing. Please add it and resubmit.

  1. Can someone confirm if this issue sounds like a recurrence of the Google bug?
  2. Is there a workaround to their bug where specifying something differently can coax Google into accepting the sitemap?

Thank you.

P.S. I can't give you a link to the actual file because it's a client, but I'm generating the file through XMLWriter to make sure it's valid XML as well...

UPDATE Actually I think it might have worked...their reporting page is a little weird...it shows a page based on current date but just noticed that under the error, it says first detected in September, and it also reports number of URLs found in sitemap file, so maybe the errors are old...

I'll watch it next couple days and provide an update again when I ahve something new.

A: 

Without seeing the XML it's hard to know what the problem is. But, without seeing the XML (and from reading the Google threads), it sounds like the issue is not that the sitemap is invalid XML and more like the problem is that Google's parser is expecting something beyond what's strictly required by the XML schema.

Since Google's automated tests would almost certainly have caught this problem if it affected all sitemaps, I'd guess that there is something about your particular sitemap which is causing the problem.

So I'd consider trying the usual simple approach to detecting content problems: delete half the content and see if the problem still happens. If it does, try deleting the other half. If one of the halves goes through OK, keep subdividing until you've found the culprit. If both fail, keep dividing one half until you either run out of content or the problem goes away, and then identify the culprit pattern.

Or, alternatively, you could go in the opposite direction: generate a brand-new sitemap file, fill it with one dummy link, and make sure Google accepts it. Assuming that works, add your content in one chunk at a time until it breaks, and then identify the culprit. If it doesn't work, try copying a known-good sitemap file from somewhere (e.g. http://www.google.com/hostednews/sitemap_index.xml), stipping out the content, and inserting your own.

Justin Grant
The Google link is down, was the example a made up location or ?
joedevon
nope, real link. I pulled it from the sitemaps links in http://www.google.com/robots.txt . Interesting you can't get to Google's sitemap. Are you sitting behind a corporate proxy or other obvious reason you can't get to it? Try visiting the google robots.txt and see if you see different links there... might be a geo-location thing? (I'm in Berkeley, CA)
Justin Grant
Hmm, maybe corporate proxy though that's a strange page to deny access to...maybe it think it's bot behavior?
joedevon
no idea. you might want to try other sitemaps (e.g. files listed in http://www.microsoft.com/robots.txt ) and see if you get same behavior.
Justin Grant