views:

117

answers:

1

I am a big fan of using apache-digester to load XML files into my object model.

I am dealing with large files that contain many duplicates (event logs), and would therefore like to String.intern() the strings for specific attributes (the ones that frequently repeat).

Since Apache-Digester reads the whole file before relinquishing control, it initially generates a lot of duplicates that eat up a lot of memory; I can then go and iterate over all my objects and intern, but I still pay the cost of using up lots of memory.

Another altenrative is to have my corresponding setProperty bean function in my object model always intern the parameter, but I use the same function from within my code on already interned strings, so that would be wasteful; besides, I don't want to introduce digester specific code into my model.

Is there a way to get Digester to intern or execute custom code before/after setting properties?

+1  A: 

You can create your own Digester Rule to accomplish this:

public class InternRule extends BeanPropertySetterRule
{
    public InternRule( String propertyName )
    {
        super( propertyName );
    }

    @Override
    public void body( String namespace, String name, String text )
        throws Exception
    {
        super.body( namespace, name, text.intern() );
    }

}

Instead of doing:

digester.addBeanPropertySetter( "book/author", "author" );

You would do this:

digester.addRule( "book/author", new InternRule( "author" ) );

Depending on which digester method you're using there are different classes you can subclass (SetPropertyRule, CallMethodRule, etc)

mtpettyp