I am making a quiz system, and when quizmakers insert questions into the Question Bank, I am to check the DB for duplicate / very highly similar questions.
Testing MySQL's MATCH() ... AGAINST(), the highest relevance I get is 30+, when I test against a 100% similar string.
So what exactly is the relevance? To quote the manual:
Relevance values are non-negative floating-point numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row, the number of unique words in that row, the total number of words in the collection, and the number of documents (rows) that contain a particular word.
My problem is how to test the relevance value if a string is a duplicate. If it's 100% duplicate, prevent it from being inserter into Question Bank. But if it is only so similar, prompt the quizmaker to verify, insert or not. So how do I do that? 30+ for 100% identical string is not percentage, so I'm stump.
Thanks in advance.