tags:

views:

39

answers:

1

Any given article on our site has the meta tags for title, description, image, and keywords in the head element, but for some reason none of the news aggregate sites won't pull any of it.

http://darthhater.com/2010/06/25/friday-update-preview http://darthhater.com/2010/06/24/official-bioware-stance-on-game-testing-leaks

Not trying to post an advertisement. We really do have a problem. The share link is in the bottom right of the article with links to Facebook, Digg, and Reddit. It's too bad none of them provide debugging systems to figure out why stuff is improperly pulled into their system.

I'm thinking it might have something to do with the gzip compression of the site, or maybe because the PHP XSL parser is outputting the site as XML (I remove the start tag programmatically, but even if I set the XSL to 'html' the problem persists. I thought maybe it had to do with stripped whitespace, or the order of the meta tags (ridiculous, I know). It's a little annoying, and if I put our URLs into SEO checkers like seocentro.com it find all of the meta tags just fine, so it's obviously not a page parsing error on their end.

A: 

My shot in the dark is that this is because you have the head part in one huge line:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:magasi="http://www.magasi-php.com/" xmlns:php="http://www.w3.org/1999/XSL/Transform"&gt;&lt;head&gt;&lt;meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><meta name="title" content="Friday Update Preview" /><meta name="description" content="Sean Dahlberg, Star Wars: The Old Republic Community Manager, informs the community that tomorrow's update will be a late one:  Just wanted to let everyone kno..." /><link rel="image_src" href="http://darthhater.com/images/fbimage.jpg" /><meta name="keywords" content="Friday Preview,Sean Dahlberg" /><link rel="alternate" type="application/rss+xml" title="Darth Hater - A Star Wars: The Old Republic Community RSS Feed" href="http://darthhater.com/feed/" /><link type="text/css" rel="stylesheet" href="/styles/DarthHater/style/main.css" /><script type="text/javascript" language="javascript">

it's probably valid HTML, but I wouldn't be surprised if a parser choked on it.

Also, you have 438 validation errors. This is probably not your problem, as it's mostly minor things and parsers should be able to deal with invalid HTML, but one never knows.

Pekka
If I manually add line breaks after the html, head, and each of the meta tags, the problem is still there. :/
David
@David strange. What about trying a skeleton HTML page with no content to see whether that gets pulled?
Pekka
Well, actually if I just pull the displayed source code from the page and save it to a standard html file, the meta tags pull fine.http://darthhater.com/test_meta.htmlIf I just add that as a link to facebook or digg manually, everything pulls, so I guess it's possible that it could be the page compression? Is there a way to find out if a visitor is the facebook script or the digg script so I can turn off compression for those visitors? :P
David
Yep, I can actually confirm that if I just turn off the gzip output encoding in php that everything pulls fine. Hmmm...
David
@David that's really odd! But why are you gzipping in PHP and not leaving it to Apache in the first place? Maybe their crawlers don't send the `acccept-encoding: gzip,deflate` header? Apache would recognize that and turn off zipping automatically.
Pekka
Aight, I'll get mod_deflate going on our apache configuration and see if that fixes it.
David