Most of the webpages now-a-days contain lists of things, or chunks of html patterns that repeat a lot.
For example:
- Facebook status messages on homepages.
- Digg/Hacker News
- StackOverflow homepage
Is there a Java library for detecting such lists. It will involve some amount of pattern matching and intelligence. Thanks.