views:

86

answers:

2

I have a MySQL table of "people" as part of a web site, for example:

| people_id  | firstname  | lastname    |
-----------------------------------------
| 1          | John       | Lennon      |
| 2          | Paul       | McCartney   |
| 3          | George     | Harrison    |
| 4          | Ringo      | Starr       |
| .          | .          | .           |

My table has about 2000 rows.

I also have a "news" section on the web site. Frequently, these news items contain references to "people", e.g.

John Lennon and Paul McCartney wrote some of the most popular songs in the history of rock music.

Is it possible (or reasonable/advisable) to automatically parse each item of news to look for "people" who are in the database, and then turn them into links. So, for example, the above text would be turned into this (or something functionally equivalent):

<a href="/people/1>John Lennon</a> and <a href="/people/2">Paul McCartney</a> wrote some of the most popular songs in the history of rock music.

What would be the best way to do this? I've made a few abortive attempts to do it using regular expressions in php, but I guess this is not the best approach. I don't known much about javascript (and its frameworks) but I would be happy to use this if it makes sense to do so.

This is not an essential feature of the web site (but I feel it would be a nice addition) so I would prefer to omit such a feature rather than increasing the page load time dramatically.

EDIT

I left out some details in the initial question, to keep the length down.

In fact it is the web site for a football club - all of the "people" are members of the web site, and can log in, and add and edit news stories (for example, match reports), in which they frequently refer to other "people". So it's not just me who adds the news stories - they can be added by the (around) 2000 other users.

Although the membership is restricted in that people have to be approved before joining, the system has to be able to cope with complexities such as people with unusual names, and there are a few instances of more than one people with the same name.

I have implemented a kind of solution, in which I use a type of proprietary code to mark up/down people's names (e.g. [p=1]John Lennon[/p]) but I have found that, of the 2000 users of the site, only a handful make use of this.

For what it's worth, the web site is www.ouafc.com, and an example of a news story is at www.ouafc.com/news/312.

+1  A: 

Your best bet is to manually mark up the news posts in some way, to indicate when a name appears. It's the only way to prevent missed names, or improperly parsed names, and to avoid the huge processing requirement of scanning every news story for every possible name from the database.

Maybe something with a twittery-syntax like:

@[John Lennon] and @[Paul McCartney] wrote some of the most popular songs in the history of rock music.

Then run it through a custom sort of markdown-style function when you want to display the news story. It could parse those markers out, find the matching DB record, and generate the link.

It'd be more efficient to convert @[] markers to links before inserting the news story in the database, but that couples things more tightly - if a user is deleted or his ID changes, you have a broken link. Storing the @[] also makes editing stories easier.

Update

If you must have the names automagically detected and converted to links, this can be achieved at a pretty severe performance penalty that only grows as you add more names:

function linkify_names($news) {
  $people = query('select people_id, firstname, lastname from people');

  $from = $to = array();
  foreach ($people as $person) {
    $name = "$person->firstname $person->lastname";
    // TODO - escape regex chars in $name?

    // match [boundary]$name[boundary], case insensitive
    $from[] = "/(\b)($name)(\b)/i";

    // include boundaries in replacement; maintain case of found name
    $to[] = '$1<a href="/people/' . $person->people_id .'">$2</a>$3';
  }

  return preg_replace($from, $to, $news);
}

The difference is, instead of looking up only the names marked with @[] tags, you must look up all names, and exhaustively search for each one. You cannot rely on simple regular expressions to find names in the body of the news document.

meagar
I have in fact implemented this sort of solution already (perhaps I should have mentioned it in the question, but I wanted to keep it simple). However, I consider this to be a less than ideal solution, simply because the "news" items can be added by any one of a number of users of the web site. While I have provided the functionality to add links in a similar way to that in which you suggested - most users of the site do not bother using the functionality. So I would prefer to have an "automatic" way of doing it.
Tomba
@Tomba no worries
meagar
+2  A: 

I don't know much about php, but here's a quick JavaScript go at it using jQuery 1.4:

<div id="maindiv">
   John Lennon and Paul McCartney wrote some of the most popular songs in the history of rock music.
</div>


<script>
   $(document).ready(function(){
       myPage.linkify($("#maindiv"));
    })

var myPage = {
    map: {
            "John Lennon": 1,
            "Paul McCartney": 2,
            "Rock Music": 3
         },

    linkify: function(domEl){
        var htmlcopy = domEl.html();

        function buildLink(txt, loc){
            return '<a "href = /blah/'+loc+'>'+txt+'</a>';
        }

        for(i in myPage.map){
           var tmpStr = new RegExp(i,"gi");
           htmlcopy = htmlcopy.replace( tmpStr, buildLink(i, myPage.map[i]) );
       }

       domEl.html(htmlcopy);
    }
 }
</script>

myPage.map would be built server-side from the database. This could also be a callback to an Ajax function (that would grab the map) so that it wouldn't stop the rest of the page from doing its thing.

jkyle
Thanks for taking the time to type that out. I have not used any javascript on the site, but this looks like a perfect use for it. It'll take me a while to get to grips with jquery but I've been meaning to do that for some time now. I don't think I would want myPage.map to be visible to users of the site so I'd have to look into doing it via Ajax.
Tomba
If you don't want your users to be able to see the map, then a client-side approach is not going to be a good option. Anything that goes across the wire is available to the viewer, including Ajax responses. There are some server-side JS implementations, but you're probably better off recreating this in php in that case.
jkyle
maybe I'll have to rethink whether I need myPage.map to be invisible
Tomba
It works great - thanks. I'm very new to jquery and not experienced in javascript: how would I go about obtaining myPage.map through an Ajax function?
Tomba
Check out the documentation here: http://api.jquery.com/jQuery.getJSON/You'll do something like:$.getJSON('someurl/getMap.php', function(data) { myPage.map = data;});Where 'someurl/getMap.php' is a function on the server that builds the JSON version the data you need.
jkyle
@jkyle thanks for your help
Tomba