I am trying to use Difflib.SequenceMatcher to compute the similarities between two files. These two files are almost identical except that one contains some extra whitespaces, empty lines and other doesn't. I am trying to use
s=difflib.SequenceMatcher(isjunk,text1,text2)
ratio =s.ratio()
for this purpose.
So, the question is how to write the lambda expression for this isjunk method so the SequenceMatcher method will discount all the whitespaces, empty lines etc. I tried to use the parameter lambda x: x==" ", but the result isn't as great. For two closely similar text, the ratio is very low. This is highly counter intuitive.
For testing purpose, here are the two strings that you can use on testing:
What Motivates jwovu to do your Job Well? OK, this is an entry trying to win $100 worth of software development books despite the fact that I don‘t read
programming books. In order to win the prize you have to write an entry and
what motivatesfggmum to do your job well. Hence this post. First motivationmoney. I know, this doesn‘t sound like a great inspiration to many, and saying that money is one of the motivation factors might just blow my chances away.
As if money is a taboo in programming world. I know there are people who can‘t be motivated by money. Mme, on the other hand, am living in a real world,
with house mortgage to pay, myself to feed and bills to cover. So I can‘t really exclude money from my consideration. If I can get a large sum of money for
doing a good job, then definitely boost my morale. I won‘t care whether I am using an old workstation, or forced to share rooms or cubicle with other
people, or have to put up with an annoying boss, or whatever. The fact that at the end of the day I will walk off with a large pile of money itself is enough
for me to overcome all the obstacles, put up with all the hard feelings and hurt egos, tolerate a slow computer and even endure
And here's another string
What Motivates You to do your Job Well? OK, this is an entry trying to win $100 worth of software development books, despite the fact that I don't read programming books. In order to win the prize you have to write an entry and describes what motivates you to do your job well. Hence this post.
First motivation, money. I know, this doesn't sound like a great inspiration to many, and saying that money is one of the motivation factors might just blow my chances away. As if money is a taboo in programming world. I know there are people who can't be motivated by money. Kudos to them. Me, on the other hand, am living in a real world, with house mortgage to pay, myself to feed and bills to cover. So I can't really exclude money from my consideration.
If I can get a large sum of money for doing a good job, then thatwill definitely boost my morale. I won't care whether I am using an old workstation, or forced to share rooms or cubicle with other people, or have to put up with an annoying boss, or whatever. The fact that at the end of the day I will walk off with a large pile of money itself is enough for me to overcome all the obstacles, put up with all the hard feelings and hurt egos, tolerate a slow computer and even endure
I ran the above command, and set the isjunk to lambda x:x==" ", the ratio is only 0.36.