views:

339

answers:

2

Hello,

I am slightly confused as to what "feature selection / extractor / weights" mean and the difference between them. As I read the literature sometimes I feel lost as I find the term used quite loosely, my primary concerns are --

  1. When people talk of Feature Frequency, Feature Presence - is it feature selection?

  2. When people talk of algorithms such as Information Gain, Maximum Entropy - is it still feature selection.

  3. If I train the classifier - with a feature set that asks the classifier to note the position of a word within a document as an example - would one still call this feature selection?

Thanks Rahul Dighe

+1  A: 

Feature Selection is the process of choosing "interesting" features from your set for further processing.

Feature Frequency is just that, the frequency that a feature appears.

Information Gain, Maximum Entropy, etc. are weighting methods, which use Feature Frequency, which in turn, allow you to perform Feature Selection.

Think of it like this:

You parse a corpus, and create a term / document matrix. This matrix starts out as a count of the terms, and what document in which they appear (simple frequency).

To make that matrix more meaningful, you weight the terms based on some function including the frequency (like term frequency-inverse document frequency, Information gain, maximum entropy). Now that matrix contains the weights, or importance of each term in relation to the other terms in the matrix.

Once you have that, you can use feature selection to keep only the most important terms (if you are doing stuff like classification or categorization) and perform further analysis.

GalacticJello
so what is feature extraction ?
Rahul
Feature extraction is the process of reducing the dimensionality of your data (usually through SVD, PCA, etc).See this link: http://en.wikipedia.org/wiki/Feature_extraction
GalacticJello
A: 

Feature extraction: reduce dimensionality by (linear or non- linear) projection of D-dimensional vector onto d-dimensional vector (d < D). Example: principal component analysis

Feature selection: reduce dimensionality by selecting subset of original variables. Example: forward or backward feature selection

Ajit

related questions