For a while, I've been trying to find a way of intelligently extracting the "relevant" text from a URL by eliminating the text related to ads and all the other clutter.After several months of researching, I gave it up as a problem that cannot be accurately determined. (I've tried different ways but none were reliable)
A week back, I stumbled across Readability - a plugin that converts any URL into readable text. It looks pretty accurate to me. My guess is that they somehow have an algorithm that's smart enough to extract the relevant text.
Does anyone know how they do it? Or how I could do it reliably?