I have a list representing products which are more or less the same. For instance, in the list below, they are all Seagate hard drives.
- Seagate Hard Drive 500Go
- Seagate Hard Drive 120Go for laptop
- Seagate Barracuda 7200.12 ST3500418AS 500GB 7200 RPM SATA 3.0Gb/s Hard Drive
- New and shinny 500Go hard drive from Seagate
- Seagate Barracuda 7200.12
- Seagate FreeAgent Desk 500GB External Hard Drive Silver 7200RPM USB2.0 Retail
For a human being, the hard drives 3 and 5 are the same. We could go a little bit further and suppose that the products 1, 3, 4 and 5 are the same and put in other categories the product 2 and 6.
We have a huge list of products that I would like to classify. Does anybody have an idea of what would be the best algorithm to do such thing. Any suggestions?
I though of a Bayesian classifier but I am not sure if it is the best choice. Any help would be appreciated!
Thanks.