Efficent methods for finding most common phrases in a body of text AKA trending topics | ansaurus

tags:

views:

71

answers:

1

+3 Q:

Efficent methods for finding most common phrases in a body of text AKA trending topics

Hi,

I previously asked a similar question on this topic, I ended up deriving several solutions which worked, one based on bloom filters + ngrams, the other based on hash tables + ngrams. Both solutions perform fine with small data sets (<1000 texts, usually tweets) but the computation time grew exponentially meaning doing 10,000 could take hours.

I am currently working in Ruby and perhaps, that is the problem but are there any other solutions or approaches I could attempt to solve this problem?

Thanks, Ben

+1 A:

If you are looking to do text searching in large sets of data, you might have to look into something like solr. There is a really easy to setup solr gem called sunspot http://outoftime.github.com/sunspot/

smnirven 2010-07-27 20:20:20

related questions

How do I find the Excel column name that corresponds to a given integer?

Calculating a cutting list with the least amount of off cut waste.

Red-Black Trees

How to maintain a recursive invariant in a MySQL database?

RFC calculation in Java need help with algorithm

Best word wrap algorithm?

How do you separate game logic from display?

Most effective way for float and double comparison

Choosing a multiplier for a (string) hash function

Optimizing a search algorithm in C

Find the best combination from a given set of multiple sets

What "already invented" algorithm did you invent?

Designing a Calendar system like Google Calendar

How to overload std::swap()

Looking for algorithm that reverses the sprintf() function output

Merge Sort a Linked List

Puzzle: Find largest rectangle (maximal rectangle problem)

graph serialization

Peak detection of measured signal

Big O, how do you calculate/approximate it?

What problems can be solved, or tackled more easily, using graphs and trees?

Followup: "Sorting" colors by distinctiveness

Efficiently get sorted sums of a sorted list

Function for creating color wheels

Fastest way to get value of pi