hello,
I have snapshots of multiple webpages taken at 2 times. What is a reliable method to determine which webpages have been modified?
I can't rely on something like an RSS feed, and I need to ignore minor noise like date text.
Ideally I am looking for a Python solution, but an intuitive algorithm would also be great.
Thanks!