tags:

views:

91

answers:

4

What's the best way to track RSS subscribers reliably without using Feedburner? Some of the obvious approaches like tracking by IP or by the number of hits have some fata flaws. IP addresses can change with each request or multiple users can use the same IP. Also, feed readers can request a feed multiple times per day or even hour. Both problems make it really hard to get reliable stats on unique subscribers.

I've read articles by both Leo Notenboom and Tim Bray on the topic, but none of their suggestions seems to really solve how to track subscribers in an accurate and reliable way. Leo suggests having a unique ID generated programatically to be appended to the RSS feed URL for each time the referring page is loaded. Tim advocates having RSS readers generate a unique hashtag and also has suggestions ranging from tracking the referrers to using cookies. A unique URL would be reliable, but it has two flaws: It's not a user-friendly URL and it creates duplicate content for SEO. Are there any other reliable methods of tracking RSS subscribers? How does Feedburner estimate subscribers?

A: 

You could query your web server logs for traffic to your RSS feed, perhaps filter it by IP to get the number of uniques.

The problem is, that would rely on folks checking the feed daily. The frequency of hits to your RSS feed by one individual could vary day to do and the number could be lower.

joshtronic
Thanks for the suggestion. There are problems with this method, though. IP addresses can change with each request or multiple users can use the same IP. Also, feed readers can request a feed multiple times per day or even hour. Both problems make it really hard to get reliable stats on unique subscribers.
VirtuosiMedia
+1  A: 

There isn't really a standard way to do this. Subscriber counting is always unreliable but you can get good estimates with it.

Here's how Google does it (source):

Subscribers counts are calculated by matching IP address and feed reader combinations, then using our detailed understanding of the multitude of readers, aggregators, and bots on the market to make additional inferences.

Of course part of this is easy for Google, as they can first calculate how many Google Reader users are subscribed to the feed in question. After that they use IP address matching also, and that's what you should use too.

You could calculate individual IP addresses (i.e. unique) from the web-servers logs, but that would count 10 people as 1 if they all use the same address. That's why you should inspect the HTTP-headers which are sent by the client, more specifically header fields HTTP_X_FORWARDED_FOR and HTTP_VIA. You could use the HTTP_VIA address as the "main" address, and then calculate how many unique HTTP_X_FORWARDED_FOR addresses are subscribed to the feed. If the subscriber doesn't have these proxy-added fields, then it's counted as a unique IP address. These should be handled in the code that generates the feed. You could also add a GeoIP lookup for the IP's and store everything to a database. This would allow you to see which country has the most subscribers to your feed.

This has it's problems too. All proxies don't use these fields and it doesn't fix the problem of calculating subscribers behind NAT gateways. It is however a good estimate. Besides, you are probably more interested in the order of magnitude rather than the exact count of subscribers, aren't you? If the counter says that you have 5989 subscribers you probably have more subscribers as the counter gives you the lower bound.

vtorhonen
A: 

If you configure your RSS feed to require some kind of authentication, you can do user-based metrics instead of ip-based metrics. Although this would be a technically-correct solution, getting people to opt into an authenticated blog in anything other than an Intranet scenario is a stretch.

kbrimington
That would be ideal, but unfortunately authentication is not the way most feeds have worked in the wild. I'm not even sure if most RSS readers support something like that.
VirtuosiMedia
A: 

Standard and Reliable are not exactly word in RSS dictionary :-) Go tot remember that the thing doesn't even have standard XSD after how many years ? If by tracking you mean the "count" there area few things you can so and the tactics depends on the purpose i.e. are demonstrating a big number or small number? It is a marketing thing so you have to define your goals :-)

You may have to classify IP numbers for a start - to have the basic collection of big / corporate / umbrella IP numbers. For them, you can use referrer as a reasonable filtering criteria and count everything else as unique unless proven otherwise. Vast majority of IP numbers remain stable for about 2 days but again it always good to use basic referrer logic as a filter for ppl who just keep "clicking" so to speak.

Then, you need a decent list of aggregators and a classification re how they process URL-s and if they obscure end readers completely then you need either published or inferred averages - it's always fair game to use equitable distribution of an average count. Using cookies may help to collect aggregator IP-s and differentiate between automated agents and individuals.

One very important thing is to keep in mind that you can't use just one method and expect it to be a silver bullet - you need to use these 3-4 aspects at the same time plus basic statistical reasoning.

ZXX