I am doing a CSV Import tool for the project I'm working on. The client needs to be able to enter the data in excel, export them as CSV and upload them to the database. For example I have this CSV record:
1, John Doe, ACME Comapny (the typo is on purpose)
Of course, the companies are kept in a separate table and linked with a foreign key, so I need to discover the correct company ID before inserting. I plan to do this by comparing the company names in the database with the company names in the CSV. the comparison should return 0 if the strings are exactly the same, and return some value that gets bigger as the strings get more different, but strcmp doesn't cut it here because:
"Acme Company" and "Acme Comapny" should have a very small difference index, but "Acme Company" and "Cmea Mpnyaco" should have a very big difference index Or "Acme Company" and "Acme Comp." should also have a small difference index, even though the character count is different. Also, "Acme Company" and "Company Acme" should return 0.
So if the client makes a type while entering data, i could prompt him to choose the name he most probably wanted to insert.
Is there a known algorithm to do this, or maybe we can invent one :) ?