tags:

views:

22

answers:

2

I need to submit my hundreds of products to hundreds of websites. For most websites I need to select a directory/category for each product. But it seems each website has a different definition of categories. For example, some list laptops under computers/hardware, some under computers/laptop, some under /electronics/computers, some under eletctronics/PCs.

It is so hard to automatically select a category for each product. Could you kindly give me some suggestions?

Thank you so much!

+2  A: 

Yes, it's hard. No one agrees on the categories.

The Unix "rm" command ("remove") is spelled "del" in Windows. Why? People don't agree on something that simple and obvious.

What kind of magic do you want? Your task requires a person to think.

A person must (1) understand your products and (2) understand the web site categories and then (3) choose the right category based on the understanding. Think and make a judgement.

Since the web site categories are just words, your software may have to guess and assume at some of the meanings. What does "household" or "consumer" mean? Only in context can you guess at the meaning.

S.Lott
A: 

I would try to build a graph with synonyms and generalizations. For example Notebook and Laptop are synonym. Computer generalizes them. PC is a synonym for Computer. Electronics generalizes Computer (and its synonym PC) again.

Now, for a given product, look at the deepest level of the available categories and look for the most specific synonyms for that product from your graph. If there is no match, move one level up because they may have more specific categories then you graph - they may for example divide notebooks by brands. When you reach the root of the categories without a match move on to the first generalization from your graph and again search from the deepest category level upwards.

This solution has still issues because for example categories may be divided by brand on a very high level or on a very deep level and you choose one option when building the graph. It is quite possible to handle such cases, too, but it will become much trickier.

Daniel Brückner