views:

90

answers:

1

I have a list of keywords, about 25,000 of them. I would like people who add a certain < script> tag on their web page to have these keywords transformed into links. What would be the best way to go and achieve this?

I have tried the simple javascript approach (an array with lots of elements and regexping/replacing each) and it obviously slows down the browser.

I could always process the content server-side if there was a way, from the client, to send the page's content to a cross-domain server script (I'm partial to PHP but it could be anything) but I don't know of any way to do this.

Any other working solution is also welcome.

A: 

I would allow the remote site add a javascript file and using ajax connect to your site to get a list of only specific terms. Which terms?

  • Categories: Now if this is for advertising (where this concept has been done a lot) let them specify what category their site falls into and group your terms into those categories. Then only send those groups of terms. It would be in their best interest to choose the right categories because the more links they have the more income they can generate.

  • Indexing: If that wouldn't work, you can maybe when the first time someone tries to load the page, on your server index a copy of it and index all the words on their page with the terms you have and for any subsequent loads you have a list of terms to send them based on what their page contains. ideally after that you would have some background process that indexes their pages with your script like once a day or every few days to catch any updates. Possibly use the script to get a hash of the page contents and if changed at all you can then update your indexed copy.

I'm sure there are other methods, which is best is really just preference. Try looking at a few other advertising-link sites/scripts and see how they do it.

Jonathan Kuhn
Thanks for your answer. This is for a gaming site and the keywords are game items, race names, etc. so no categories are involved, I'm afraid. I like the idea of indexing but how would I go and read the page's content from my server-side script? For example, if someone at example.com were to add < script src="mysite.com/script.php"> I don't see how I could read example.com's page, index and hash it and create a list of keywords tailored to it.
Technoh
There are many ways you can get a remote page. Most easy would be to just make a php script that uses something like file_get_contents() on the remote page. However you would need url fopen turned on in the php.ini. if it is on, if you did `$page = file_get_contents("http://www.google.com");` then `$page` would have a string containing the html source of the page (this wouldn't include any ajax loaded data because file_get_contents() doesn't parse javascript). Then break that down to maybe just the body and further parse the page to find your keywords and store that in a DB.
Jonathan Kuhn
I think there is a misunderstanding. What I want to do is to be able, with a simple < script src="...">< /script>, to parse the web page where the < script> tag was added and add links based on certain keywords. I know how to open a remote page from the server-side but how can I open a remote page from the client side, while passing a (quite possibly very) long argument to the remote page and then change the client-side's page according to the remote page's answer?
Technoh
And what I would suggest is to have your script, included in their page use ajax to contact your sever. your server does a db lookup to see if that page has been indexed yet. if it hasn't, then using php on your server, get the page parse it and index it and generate a list of terms to send back. then on each subsequent request for the same page, you can just send back only those terms that they have on their page when the script does an ajax call.
Jonathan Kuhn
Excellent idea! I will add hashing of the web page to make sure the cached data is the latest version. This solution will of course exclude pages that are not available from an outside source but this is alright since those pages would not be available from a search engine anyway. Thanks for the help and the great solution!
Technoh
although the hashing would be nice (and I believe I suggested it), be careful with sites that have comments. You don't want to have to re-index/hash every time someone leaves a comment. One of the hardest parts would probably be trying to accurately get just the body of a page without comments. You might be able to look at the source of popular CMS packages (like wordpress, mambo, joomla...etc) and see if they have any tags around the main body (html comments, id/class of div) then just hash only that. A simple idea too would be store the time and only index if > 1 day has passed.
Jonathan Kuhn