I'm trying to identify differences between a base case and supplied case. Looking for a library to tell me similarity in percentage or something like that.
For Example:
I've 10 different HTML pages. * All of them are 404 responses with only one 2 lines of random code (such as time or quote of the day).
Now when I supply a new 404 page I want a result back such as "%80" similar,however if I supply another page totally different or same website but quite different content I should get something lile "%20 similar".
Basically what I want to do is, when I've got a new response I want to identify if the new response is similar to these 10 pages which I supplied before.
I'm trying to solve this in .NET, A library or an algorithm recommendation would be great.