Company 1 has this vector:
['books','video','photography','food','toothpaste','burgers'] ... ...
Company 2 has this vector:
['video','processor','photography','LCD','power supply', 'books'] ... ...
Suppose this is a frequency distribution (I could make it a tuple but too much to type).
As you can see...these vectors have things that overlap. "video" and "photography" seem to be "similar" between two vectors due to the fact that they are in similar positions. And..."books" is obviously a strong point for company 1.
Ordering and positioning does matter, as this is a frequency distribution.
What algorithms could you use to play around with this? What algorithms could you use that could provide valuable data for these companies, using these vectors?
I am new to text-mining and information-retrieval. Could someone guide me about those topics in relation to this question?