tags:

views:

45

answers:

1

Hi,

Can anyone tell me what feature geneators are with respect to natural language processors?

Thanks

Paul

A: 

If I'm reading this correctly, I believe "feature generation" in this quote is referring to the process of extracting features from your text. Without going into too much detail this is basically getting the dimensions of your data you think would be useful for your prediction/classification task and putting it into a vector representation.

For example, suppose we were trying to create a classifier to determine if an e-mail was spam. We might extract features such as CONTAINS_WORD_NIGERIA or IS_FROM_PERSON_IN_CONTACT_LIST. Or if we were to follow the quote above we might make specialized features using the html tags such as PERCENT_OF_WORDS_IN_HREF_TAG. As you might imagine, you can go overboard when feature engineering, and the real challenge lies is in optimizing your feature set to give you good results on unseen data.

hapagolucky