ansaurus

Question

Searching text for (potentially) tens of thousands of tokens

Answer 1

A:

We had a similar situation. We ended up using Regular Expressions for the parsing and replacement of the tokens. Because the original article was a template that we would generate new articles with the tokens replaced, we'd cache the generated one so no changes to the template meant no new parsing.

Joshua Belden 2009-04-30 15:28:24

Answer 2

+1 A:

This is always going to be difficult (computationally, anyway) unless you can get some guarantee of the token format. Without markup, the computer really doesn't know that any particular string of characters has any special meaning, if it can't be taught to recognize a format.

The "simple" answer is to loop through the text for each token, see if it's there, and handle it. But you'll have two issues: computation time, and collisions (as Chad pointed out in his comment).

Is there a very simple markup you can enforce? MediaWiki only creates internal links if a phrase is surrounded by [[brackets]]. Lots of wiki software will only make links if you CamelCaseThePhrase.

I can't think of a way for the application to automagically know certain character groups have meaning without checking every defined token or enforcing some kind of format.

Are you sure your audience can't handle something like

SteveMcMuffin ate seventeen FabulousFurryFajitas at
TheStinkingBean, while JohnsonFatlumps ate thirty-two.

or

[[Steve McMuffin]] ate seventeen [[Fabulous Furry Fajitas]] at
[[The Stinking Bean]], while [[Johnson Fatlumps]] ate thirty-two.

James Socol 2009-04-30 15:56:30

Almost certain, unfortunately. If they could handle stuff like that though, I'd have a lot less work to do :)

Shabbyrobe 2009-05-01 19:04:30

ansaurus

tags:

views:

answers:

Searching text for (potentially) tens of thousands of tokens

related questions