Note: Edited for clarification.
Clarification: I'm writing a "bridge" between the user and a search engine, not a search engine. Part of my value add will be inferring the intent of a query. The intent of a tracking number, stock symbol, or address is fairly obvious. If I can categorise a query, then I can decide if the user even needs to see search results. Of course, if I cannot, then they will see search results. I am currently designing this "inference engine."
Original question: I'm writing a parser and I want to take any given token and give it a category. Here are some theoretical examples. (I'm limiting to English for now)
"denver" is a USCITY and a PLACENAME
"aapl" is a NASDAQSYMBOL and a STOCKTICKERSYMBOL
"555 555 5555" is a USPHONENUMBER
etc...
I know that each of these cases will most likely require specific handling, however I'm not sure where to start.
Ideally I'd end up with something simple like:
queryCategory = magicCategoryFinder( query )
>print queryCategory
>"SOMECATEGORY or a list"