I have created a cron job for my website which runs every 2hours and it counts the words in the feeds, and then displays the 10 highest count words as the hot topics.
Some thing what twitter does on there homepage, to show the most popular topics that are being discussed.
What my cron job does right now is it counts the words except for the words that i have mentioned, words like:
array('of', 'a', 'an', 'also', 'besides', 'equally', 'further', 'furthermore', 'in', 'addition', 'moreover', 'too',
'after', 'before', 'when', 'while', 'as', 'by', 'the', 'that', 'since', 'until', 'soon', 'once', 'so', 'whenever', 'every', 'first', 'last',
'because', 'even', 'though', 'although', 'whereas', 'while', 'if', 'unless', 'only', 'whether', 'or', 'not', 'even',
'also', 'besides', 'equally', 'further', 'furthermore', 'addition', 'moreover', 'next', 'too',
'likewise', 'moreover', 'however', 'contrary', 'other', 'hand', 'contrast', 'nevertheless', 'brief', 'summary', 'short',
'for', 'example', 'for instance', 'fact', 'finally', 'in brief', 'in conclusion', 'in other words', 'in short', 'in summary', 'therefore',
'accordingly', 'as a result', 'consequently', 'for this reason', 'afterward', 'in the meantime', 'later', 'meanwhile', 'second', 'earlier', 'finally', 'soon', 'still', 'then', 'third'); //words that are negligible
But this does not completely solves the issue of eliminating all the non-required words. And give only the words that are useful.
Can someone please guide me on this, and tell me how can i improve my algorithm.
Regards Zeeshan