views:

78

answers:

1

Hi all

I have a particular problem and need to know the best way to go about solving it.

I have a php string that can contain a number of keywords (tags actually). For example:-

"seo, adwords, google"

or

"web development, community building, web design"

I want to create a pool of keywords that are related, so all seo, online marketing related keywords or all web development related keywords.

I want to check the keyword / tag string against these pools of keywords and if for example seo or adwords is contained within the keyword string it is matched against the keyword pool for online marketing and a particular piece of content is served.

I wish to know the best way of coding this. I'm guessing some kind of hash table or array but not sure the best way to approach it.

Any ideas?

Thanks

Jonathan

+1  A: 

Three approaches come to my mind, although I'm sure there could be more. Of course in any case I would store the values in a database table (or config file, or whatever depending on your application) so it can be edited easily.

1) Easiest: Convert the list into a regular expression of the form "keyword1|keyword2|keyword3" and see if the input matches.

2) Medium: Add the words to a hashtable, then split the input into words (you may have to use regular expression replacing to remove punctuation) and try to find each word of input in the hashtable.

3) Hardest: This may not work depending on your exact situation, but if all the possible content can be indexed by a search solution (like Apache SOLR, for example) then your list of keywords could be used as a search string and you could return results above a particular level of relevance.

It's hard to know exactly which solution would work best without knowing more about your source data. A large number of keywords may jam up a regular expression, but if it's a short list then it might work great. If your inputs are long then #2 won't work so well because you have to test each and every input word. As always your mileage may vary, so I would start with the easiest solution I thought would work and see if the performance is acceptable.

David
Thanks David, would the first (easiest approach) handle a situation where "seo" OR "adwords" would be matched against the online marketing pool?
Jonathan Lyon
I don't see why not. If your online marketing keyword pool was "seo, adwords, adsense" you'd construct a regex "seo|adwords|adsense". If your input keywords were then "New SEO Thingamajig", that would match on the regular expression, as would "Google AdSense is neat".
David
Hi David, thanks again, regex isn't my strong point - can you give some idea on how to construct this in terms of syntax please.
Jonathan Lyon
Sorry, PHP isn't *my* strong point. Check http://stackoverflow.com/questions/2368608/what-are-the-best-places-to-learn-regular-expression-php
David