views:

113

answers:

2

This is a serious question (see my comment).

The question is simple: what are all the SEO-unfriendly things Java is doing that will make your website rank not as well as it should in the major search engines?

+1  A: 

There's a major default behavior of servlets SNAFU related to JSESSIONID.

This is HUGE (in uppercase bold).

What Google has to say about session ID in URLs:

Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.

They specifically mention here that you should not serve session IDs to search bots.

That is just one quote: on several pages Google warns webmasters about session IDs in URLs and the countless issue they raise and why it will harm your ranking.

Yet by default any Java Webapp will serve very long JSESSIONID, different everytime the search bots contact your Java website.

This not only creates hundreds of millions (!) of useless URLs in Google (and other) search engine results:

  • it clutters the screen (not too bad)

  • it also creates countless dupes (very bad)

  • it makes old content you'd want to be replaced "stick" in Google's search results (very very bad)

In addition to that, it is firmly believed that providing dupes actually lowers your ranking because Google's PageRank penalize you if you do so.

This is very concerning for any Webapp developer concerned at all by SEO.

There's a solution: provide a version without JSESSIONID to the Google bots. But be very careful: providing a different page to the Google bots and to your users can get you penalized too.

In the "JSESSIONID considered harmful" article, the author, who's obviously well aware of SEO issues, creates a filter that gets rid of the JSESSIONID altogether (no cookie, no sugar). It's a bit overkill, but it's probably better than destroying your pagerank by using the default spec'ed servlet behavior.

This is wild.

Webinator
*Yet by default any Java Webapp will serve very long JSESSIONID, different everytime the search bots contact your Java website.:* Wrong, it does that not by default. It only does that whenever you start a new session in the server side for some (unclear/unnecessary?) reason by simply calling `request.getSession()`. You see this indeed very often in poorly programmed websites. This has on the other hand (when well programmed) however the huge benefit that cookie-disabled clients are trackable.
BalusC
A: 

Search engines don't care in the least about Java, only the output HTML. Your concern is misplaced with Java, instead become a student of quality content marked up with Semantic HTML (http://en.wikipedia.org/wiki/HTML#Semantic_HTML)

If you are asking about JavaScript (instead of Java), most search engines do not pay any attention to JavaScript. So do not expect dynamically added HTML to be indexed. This also means, do not use JavaScript onclick actions to replace the basic functionality of the href attribute of anchor tags. Similar to Java, the recommendation falls back to clean semantic HTML markup of quality content.

kingjeffrey
@kingjeffrey: I've provided a first very detailed answer as to why this is a **known** (albeit not well-known, but known nonetheless) fact. In addition to that, you should note that if you can't answer a question on SO then you really shouldn't *"answer"*. If you think your comment has any relevance (in this case it doesn't), then in the future you should *"comment"* and not *"answer"*. tyvm.
Webinator
@Webinator - to be fair, the problem highlighted in your answer is with the Servlet stack, not Java. And it is really is a problem with the way that certain implementations of the stack behave out-of-the-box if you use HttpSession.
Stephen C
@Webinator. Respectfully, my "answer" is entirely accurate. Search engines to not see Java, they do not care about Java. The concern is the output. Your answer may relate to Java development, but it also relates (in principle) to PHP, ASP, Ruby, etc. It is an observation about permalink structure, which is relevant to SEO. To design a Search Engine Optimized website, one should not look to improving their Java skill set, per se – rather pay close attention to the structure and output that the search engine actually accesses and indexes.
kingjeffrey