views:

222

answers:

2

Hi all,

I need to detect search engines that refers to my website. Since every search engine has different query strings for searching(e.g. google uses 'q=', yahoo uses 'p=') I created a database for search engines with their url regex patterns.

As an example: http://www.google.com/search?q=blabla&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu%3Aen-GB%3Aofficial&client=firefox-a

the regex I created for google is:

(http:)(\\/)(\\/)(www)(\\.)(google)(\\.).*(\\/)(search).*(&q=|\\?q=).*

(I am a newbie on regex, but so far it works)

This detects that the url belongs to Google. My problem is that I need to extract the search words from the url above or from other search engines. But I dont know how to match it with the regular expression. I have tried extracting the query string from the url by using PHP functions and match it against the pattern, but it returned nothing.

Hope I could explain this clear enough.

Any suggestion?

+3  A: 

I would use parse_url to parse the URL and parse_str to parse the URL query.

$url = 'http://www.google.com/search?q=blabla&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu%3Aen-GB%3Aofficial&client=firefox-a';
$parts = parse_url($url);
if (isset($parts['query'])) {
    parse_str($parts['query'], $parts['query']);
}
var_dump($parts);
Gumbo
+1 That is neat.
codaddict
Yes but there are many query strings in the url. I have to detect the search words but which query string contains the search words? q= or p=. My plan was extracting the query by using the functions that you have mentioned and match it against the pattern to detect the query string that contains the search words.
Ahmet Keskin
@Ahmet Keskin: Detect what search engine is given (examine `$parts['host']`) and then get the associated query argument.
Gumbo
+1  A: 

This blog entry about extracting keywords from the referrer seems like it is a good match for solving your problem.

I found it using this search for 'extract query string from google referer url'. The search seems to have a number of helpful hits... I just did a sweep of the first few.

vkraemer