views:

139

answers:

1

Hi,

My structure: in each category there are texts. These texts are entries of its own. So, table 'category' and table 'texts'. There are about 90 texts in every category, each text is about 300 characters.

What i want to do is to make meta tags (keywords) for the categories. How to> get all relevant 'texts' and rank all words and take the top 10 words. These top 10 will be the category's keyword meta tag.

Now.. the trick > how to retrieve those top 10 words. Currently > I've got that each text (is a full text) will be split into a per-word array. This array (in php) will be quite long. After, i take the frequency of each word and rank it again on frequency. Voila top 10 words.

I haven't tested this procedure but I guess it might take a bit. It will be cached so it will only have to run once every week or so, but still, I wouldn't like to get a timeout.

Do you guys have any tips? Any help appreciated.

Thanks,

Maurice

A: 

Ok, now that I've said my peace in the comment above, I'll get to your algorithm.

There are several ways to do this, I'll focus on a PHP-heavy approach and let other SOers do some other ones.

I'm going to assume you've already queried the database and stored all the words as a space-separated list into the variable $texts

// Sample data in $texts
$texts      = "orange orange apple apple apple banana";
$withCounts = array_count_values( explode( ' ', $texts ) );

asort( $withCounts );
$topTen = array_keys( array_reverse( array_slice( $withCounts, -10 ) ) );

print_r( $topTen );
Peter Bailey
ah, even nicer than i thought, thx!
Maurice Kroon