views:

21

answers:

2

I'm looking for a library which parses log files (or incoming requests) and extracts out the search-terms if/when the request came from a search engine.

Are there any good libraries which provide this function?

Any language will do.

A: 

Java has the Lucene framework that is a high performance text search engine. Log files could work with this, but for incoming requests could be trickier. Do you need to parse it while it is streaming in?

ajayr
not necessarily... it would be nice, but not required
Dave Viner
A: 

There are many ways to get, parse and analyze the data you speak of.

Very simply, you could use the log file texts and import them into a SQL database for analysis (allowing you to also look at other requests, etc.).

You could use a software service such as Google Analytics.

Or my personal favorite:

Write a SQL INSERT into a tracking table. In so doing, you can parse the string into clauses -- very simply separating by words. The downside to this is that you'll miss keyword phrases such as "New York" (being two words).

The person suggesting Lucene offered up a morsel of info that could cause you to dream up a pretty neat analyzer, but it would take much work to get a complete solution. The neat thing about Lucene and Solr is that they can tokenize the keyword string using their standard libraries (chunking out two to three word clauses where you have CompoundWords or CamelCaseKeywords).

From a practical approach, I think you're best served by using something off the shelf, such as Google Analytics. If you have the time and skills, inserting a record into a database can turn into something very powerful as you add to it.

Chris Adragna
Interesting suggestion... I can't use Google Analytics, as I'm trying to do local analysis.
Dave Viner