views:

99

answers:

2

People often throw around the terms IR, ML, and data mining, but I have noticed a lot of overlap between them.

From people with experience in these fields, what exactly draws the line between these?

+3  A: 

You can also add pattern recognition and (computational?) statistics as another couple of areas that overlap with the three you mentioned.

I'd say there is no well-defined line between them. What separates them is their history and their emphases. Statistics emphasizes mathematical rigorousness, data mining emphasizes scaling to large datasets, ML is somewhere in between.

dimatura
+5  A: 

This is just the view of one person (formally trained in ML)--others might see things quite differently.

Machine Learning is probably the most homogeneous of these three terms--it's limited to the pattern-extraction/pattern-matching algorithms themselves. Of the terms you mentioned, "Machine Learning" is the one most used by Academic Departments to describe their Curricula, their academic departments, and their research programs, as well as the term most used in academic journals and conferences proceedings. ML is clearly the least context-dependent of the terms you mentioned.

Information Retrieval and Data Mining are much closer to describing complete commercial processes--i.e., from user query to retrieval/delivery of relevant results. One more more ML algorithm might be somewhere in that process flow, and in the more sophisticated applications, often is, but that't not a formal requirement.

So Information Retrieval (IR) and Data Mining (DM) are related to Machine Learning (ML) in a User-Tool kind of way. In other words, Machine Learning is one source of tools used to solve problems in Information Retrieval. But it's only one source of tools. But IR doesn't depend on ML--for instance, a particular IR project might be storage and rapid retrieval of the fully-indexed data responsive to a user's search query IR, the crux of which is optimizing performance of the data flow, i.e., the round-trip from query to delivering the search results to the user. Prediction or pattern matching might not be useful here. Likewise, a DM project might use an ML algorithm for the predictive engine, yet a DM project is more likely to also be concerned with the entire processing flow--for instance, parallel computation techniques for efficient data input to the processing engine via and efficient compression of the processed data.

Lastly consider the Netflix Prize. This competition was directed solely to Machine Learning--the focus was on the prediction algorithm, as evidenced by the fact that there was a single success criterion: accuracy of the predictions returned by the algorithm. Imagine if the 'Netflix Prize' were rebranded as a Data Mining competition. The success criteria would almost certainly be expanded to more accurately access the algorithm's performance in the actual commercial setting--so for instance overall execution speed (how quickly are the recommendations delivered to the user) would probably be considered along with accuracy.

The terms "Information Retrieval" and "Data Mining" are now in mainstream use, though for a while i only saw these terms in my job description or in vendor literature (usually next to the word "solution.") At my employer, we recently hired a "Data Mining" analyst. I don't know what he does exactly, but he wears a tie to work every day.

doug