views:

882

answers:

12

Given a string with replacement keys in it, how can I most efficiently replace these keys with runtime values, using Java? I need to do this often, fast, and on reasonably long strings (say, on average, 1-2kb). The form of the keys is my choice, since I'm providing the templates here too.

Here's an example (please don't get hung up on it being XML; I want to do this, if possible, cheaper than using XSL or DOM operations). I'd want to replace all @[^@]*?@ patterns in this with property values from bean properties, true Property properties, and some other sources. The key here is fast. Any ideas?

<?xml version="1.0" encoding="utf-8"?>

<envelope version="2.3">

  <delivery_instructions>

    <delivery_channel>
      <channel_type>@CHANNEL_TYPE@</channel_type>
    </delivery_channel>

    <delivery_envelope>
      <chan_delivery_envelope>
    <queue_name>@ADDRESS@</queue_name>
      </chan_delivery_envelope>
    </delivery_envelope>

  </delivery_instructions>

  <composition_instructions>
    <mime_part content_type="application/xml">
      <content><external_uri>@URI@</external_uri></content>
    </mime_part>
  </composition_instructions>

</envelope>

The naive implementation is to use String.replaceAll() but I can't help but think that's less than ideal. If I can avoid adding new third-party dependencies, so much the better.

+6  A: 

The appendReplacement method in Matcher looks like it might be useful, although I can't vouch for its speed.

Here's the sample code from the Javadoc:

Pattern p = Pattern.compile("cat");
Matcher m = p.matcher("one cat two cats in the yard");
StringBuffer sb = new StringBuffer();
while (m.find()) {
    m.appendReplacement(sb, "dog");
}
m.appendTail(sb);
System.out.println(sb.toString());

EDIT: If this is as complicated as it gets, you could probably implement your own state machine fairly easily. You'd pretty much be doing what appendReplacement is already doing, although a specialized implementation might be faster.

Michael Myers
sweet! simple and neat.
Jason S
The one I didn't know about was appendReplacement and appendTail; those are fantastic tools!
Chris R
A: 

I also have a non-regexp based substitution library, available here. I have not tested its speed, and it doesn't directly support the syntax in your example. But it would be easy to extend to support that syntax; see, for instance, this class.

Brian Clapper
+1  A: 

You really want to write something custom so you can avoid processing the string more than once. I can't stress this enough - as most of the other solutions I see look like they are ignoring that problem.

Optionally turn the text into a stream. Read it char by char forwarding each char to an output string/stream until you see the @ then read to the next @ slurping out the key, substituting the key into the output: repeat until end of stream.

I know it's plain old brute for - but it's probably the best.

I'm assuming you have some reasonable assumption around '@' not just 'showing up' independant of your token keys in the input. :)

Joe
mmyers' Matcher is able to report what the matched text was, when used with Chris R's example regex, the String is only processed once, each match examined and replaced with varying strings based on what the match was.
Stephen Denne
+3  A: 
Jason S
Thanks, Jason; what you have there is almost word-for-word the version I wrote. I'm going to experiment with straight cyclic replace, too, but I think I'll be fine with this approach.
Chris R
@([^@]*)@ or even better (if supported) @([^@]*+)@
Brad Gilbert
+4  A: 

It's premature to leap to writing your own. I would start with the naive replace solution, and actually benchmark that. Then I would try a third-party templating solution. THEN I would take a stab at the custom stream version.

Until you get some hard numbers, how can you be sure it's worth the effort to optimize it?

Chase Seibert
And bear in mind that the third-party solutions may well be close to optimal.
Stephen C
+1  A: 
Pete Kirkham
Primarily the advantage is in not having to pay the cost of parsing the XML;moreover, the entities in question would be resolved at parse-time, notrender-time. "@[^@]+?@" is advantageous because in our data model it does notmatch any valid string.
Chris R
You are having to parse the XML, just not as XML but using regex. I was expecting you to stream the XML through a parser and a writer to a buffer, so render time doesn't come into it. It would be a question of whether doing more work with what should be an optimised parser is faster or not.
Pete Kirkham
If you have in your data model, you need to escape it anyway if you're using regex, so you have to do that part of the processing anyway.
Pete Kirkham
+1  A: 

Text processing is going to always be bounded if you dont shift your paradigm. I dont know how flexible your domain is, so not sure if this is applicable, but here goes:

try creating an index into where your text substitution is - this is especially good if the template doesnt change often, because it becomes part of the "compile" of the template, into a binary object that can take in the value required for the substitutions, and blit out the entire string as a byte array. This object can be cached/saved, and next time, resubstitute in the new value to use again. I.e., you save on parsing the document every time. (implementation is left as an exercise to the reader =D )

But please use a profiler to check whether this is actually the bottleneck that you say it is before embarking on writing a custom templating engine. The problem may actually be else where.

Chii
A: 

Take a look at a library that specializes in this, e.g., Apache Velocity. If nothing else, you can bet their implementation for this part of the logic is fast.

Hank Gay
A: 

I wouldn't be so sure the accepted answer is faster than String.replaceAll(String,String). Here for your comparison is the implementation of String.replaceAll and the Matcher.replaceAll that is used under the covers. looks very similar to what the OP is looking for, and I'm guessing its probably more optomized than this simplistic solution.

public String replaceAll(String s, String s1)
    {
        return Pattern.compile(s).matcher(this).replaceAll(s1);
    }

public String replaceAll(String s)
    {
        reset();
        boolean flag = find();
        if(flag)
        {
            StringBuffer stringbuffer = new StringBuffer();
            boolean flag1;
            do
            {
                appendReplacement(stringbuffer, s);
                flag1 = find();
            } while(flag1);
            appendTail(stringbuffer);
            return stringbuffer.toString();
        } else
        {
            return text.toString();
        }
    }
shsteimer
A: 

... Chii is right. If this is a template that has to be run so many times that speed matters, find the index of your substitution tokens to be able to get to them directly without having to start at the beginning each time. Abstract the 'compilation' into an object with the nice properties, they should only need updating after a change to the template.

+1  A: 

As others have said, appendReplacement() and appendTail() are the tools you need, but there's something you have watch out for. If the replacement string contains any dollar signs, the method will try to interpret them as capture-group references. If there are any backslashes (which are used to escape the dollars sing), it will either eat them or throw an exception.

If your replacement string is dynamically generated, you may not know in advance whether it will contain any dollar signs or backslashes. To prevent problems, you can append the replacement directly to the StringBuffer, like so:

Pattern p = Pattern.compile("@([^@]*?)@");
Matcher m = p.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find()) 
{
    m.appendReplacement("");
    sb.append(substitutionTable.lookupKey(m.group(1)));
}
m.appendTail(sb);

You still have to call appendReplacement() each time, because that's what keeps you in sync with the match position. But this trick avoids a lot of pointless processing, which could give you a noticeable performance boost as a bonus.

Alan Moore
A: 

this is what I use, from the apache commons project http://commons.apache.org/lang/api/org/apache/commons/lang/text/StrSubstitutor.html

MitchH