Say you've got a big table that contains a varchar column.
How would you match rows that contain the word 'preferred' in the varchar col BUT the data is somewhat noisy and contains occasional spelling errors, e.g.:
['$2.10 Cumulative Convertible Preffered Stock, $25 par value',
'5.95% Preferres Stock',
'Class A Preffered',
'Series A Peferred Shares',
'Series A Perferred Shares',
'Series A Prefered Stock',
'Series A Preffered Stock',
'Perfered',
'Preffered C']
The permutations of the word 'preferred' in the spelling errors above appear to exhibit a family resemblance but there's very little that they all have in common. Note that splitting out every word and running levenshtein on every word in every row is going to be prohibitively expensive.
UPDATE:
There are a couple of other examples like this, e.g. with 'restricted':
['Resticted Stock Plan',
'resticted securities',
'Ristricted Common Stock',
'Common stock (restrticted, subject to vesting)',
'Common Stock (Retricted)',
'Restircted Stock Award',
'Restriced Common Stock',]