views:

599

answers:

6

[Note to the wise: jump to last EDIT]

I have a very simple txt sitemap (named sitemap.txt) that looks like this:

http://myDomain.com
http://myDomain.com/about.html
http://myDomain.com/faq.html
http://myDomain.com/careers.html

When I load it up on webmaster tools I get:

Sitemap is HTML - Your Sitemap appears to be an HTML page. Please use a supported sitemap format instead

I tried a few alternatives (such as with or without www) but no luck.

Anyone any clue?

Any help appreciated!

EDIT:

I tried with an xml sitemap and getting the same error so it looks like the server is serving everything as HTML (as ceejayoz correctly suggests). Now the question is ... how do I get the appspot server to server text as plain text?

EDIT:

Ok - I got fed up and implemented a servlet to serve my sitemaps (I am now trying with both XML and TXT) explicitly as text/plain. Everything works fine if I manually invoke the servlet but still getting Sitemap is HTML. I don't know where to bang my head!

EDIT: I tried to verify content-type with a firefox plugin - everything seems to be coming up as expected (I am putting the actual URL so that people can have a look):

http://wokheisandbox.appspot.com/sitemaps/sitemap.txt --> Content-type: text/plain http://wokheisandbox.appspot.com/sitemaps/sitemap.xml --> Content-type: application/xml

With my servlet (setting text/plain explicitly): http://wokheisandbox.appspot.com/wokhei/serveSitemap?fileType=TXT --> Content-type: text/plain http://wokheisandbox.appspot.com/wokhei/serveSitemap?fileType=XML --> Content-type: text/plain

All I get from webmaster tool still is -->Sitemap is HTML.

EDIT:

I think I found out the reason --> I registered on google webmaster tools my site as http://mydomain.com but the app is hosted on appspot at http://myapp.appspot.com which is mapped to mydomain.com. If I register http://myapp.appspot.com everything works fine (sitemap validated).

This is good news but it's not ideal because I want mydomain.com to be indexed ... any idea about how to overcome?

+3  A: 

Sounds like your webserver is serving .txt files as text/html instead of text/plain.

For Apache, the following in a .htaccess file should fix it:

AddType text/plain .txt
ceejayoz
This sounds like a probable cause, but I am using google app engine so I don't know how to do an AddType
JohnIdol
ok - I got fed up and implemented a servlet to serve my sitemaps (I am now trying with both XML and TXT) as text/plain. Still getting Sitemap is HTML. I don't know where to bang my head!
JohnIdol
Can we see your sitemap?
ceejayoz
sure - follow the links in the 2nd edit
JohnIdol
looks like you're gonna get those 100 rep points :)
JohnIdol
A: 

I'm fairly certain that you need to provide an XML formatted sitemap file (sitemap.xml). See here for a format example: http://en.wikipedia.org/wiki/Sitemaps.

Tim S. Van Haren
ceejayoz
+1  A: 

I found this thread discussing duplicate entries causing recent sitemap grief. I don't see this issue in your sitemap but you don't want any duplicates between entries. For example, make sure your sitemap doesn't contain BOTH of the following:

http://mydomain.com/ or http://www.mydomain.com/

AND

http://mydomain.com/index.html or http://www.mydomain.com/index.html

I think you posted your entire sitemap so, again, I don't think this is your problem exactly. You did mention you have tried various urls (with and without www.) If you are validating the sitemap via the Google WebMaster Tools it may take up to 20 minutes for correction to take affect. I hope it helps.

Ben Griswold
thanks for your contribution - yes, I posted the whole sitemap. I did all you suggested and I am rather puzzled!
JohnIdol
@JohnIdol - At this point, I might suggest converting the sitemap to the XML format or maybe change the page extension to .html. The change might make a difference at Google and/or help if the issue has to do with your webserver serving up txt files as ceejayoz suggested.
Ben Griswold
tried - no luck
JohnIdol
A: 
<?xml version='1.0' encoding='utf-8' ?>
<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'&gt;
    <url>
     <loc>http://myDomain.com&lt;/loc&gt;
    </url>
    <url>
     <loc>http://myDomain.com/about.html&lt;/loc&gt;
    </url>
    <url>
     <loc>http://myDomain.com/faq.html&lt;/loc&gt;
    </url>
    <url>
     <loc>http://myDomain.com/careers.html&lt;/loc&gt;
    </url>
</urlset>

This way always works for me.

Martin
A: 

Just in case if you will change your mind about non-xml sitemaps:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"&gt;
  <url>
    <loc>http://www.test.com/&lt;/loc&gt;
    <lastmod>2009-08-03T23:40:40+00:00</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://test/&lt;/loc&gt;
    <lastmod>2009-08-03T23:59:08+00:00</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.6</priority>
  </url>
</urlset>
Andrejs Cainikovs
I tried with that as well - see edit
JohnIdol
Hmmm... I give up. Your XML looks really close to mine. Maybe Google guys had messed up something recently? :D
Andrejs Cainikovs
JohnIdol
A: 

Thanks For the nice inforation

warrioRR
make sure to upvote who helped u :)
JohnIdol