views:

263

answers:

11

I have 9 different grammars. One of these will be loaded depending on what the first line of txt is on the file it is parsing.

I was thinking about deriving the lexer/parser spawning into sep. classes and then instantiating them as soon as I get a match -- not sure whether that would slow me down or not though. I guess some benchmarking is in order.

Really, speed is definitely my goal here but I know this is ugly code.

Right now the code looks something like this:

sin.mark(0)
site = findsite(txt)
sin.reset()

if ( site == "site1") {
   loadlexer1;
   loadparser1;
} else if (site == "site2") {
   loadlexer2;
   loadparser2;
}
.................
} else if (site == "site8") {
   loadparser8;
   loadparser8;
}

findsite(txt) {
  ...................
  if line.indexOf("site1-identifier") {
    site = site1;
  } else if(line.indexOf("site2-identifier") {
    site = site2;
  } else if(line.indexOf("site3-identifier") {
    site = site3;
  }
  .........................
  } else if(line.indexOf("site8-identifier") {
    site = site8;
  }
}

some clarifications

1) yes, I truly have 9 different grammars I built with antlr so they will ALL have their own lexer/parser objs.

2) yes, as of right now we are comparing strings and obivously that'll be replaced with some sort of integer map. I've also considered sticking the site identifiers into one regex, however I don't believe that will speed anything up.

3) yes, this is pseudocode so I wouldn't get too picky on the semantics here..

4) kdgregory is correct in noting that I am unable to create one instance of the lexer/parser pair

I like the hash idea to make the code a little bit better looking, however I don't think it's going to speed me up any.

+1  A: 

Using polymorphism is almost guaranteed to be faster than string manipulation, and will be checked for correctness at compile time. Is site really a String? If so, FindSite should be called GetSiteName. I would expect FindSite to return a Site object that knows the appropriate lexer and parser.

Another speed issue is speed of coding. It would definitely be better to have your different lexers and parsers in individual classes (perhaps with shared functionality in another). It'll make your code slightly smaller, and it will be significantly easier for someone to understand.

Mark
Plus, it would help the code not violate Open/Closed - If you needed to add Site 9, you wouldn't be editing the file containing code about what Site 1 does. A setup like this is a copy-paste error waiting to happen.
Matt Poush
+1  A: 

Something like:

Map<String,LexerParserTuple> lptmap = new HashMap<String,LexerParserTuple>();
lpt=lptmap.get(site)
lpt.loadlexer()
lpt.loadparser()

combined with some regex magic rather than string.indexOf() to grab the names of the sites should dramatically clean up your code.

lostlogic
You should explain how lptmap gets populated. As a bonus, show how java.util.ServiceLoader could be used to add new lexer/parsers dynamically simply by dropping a JAR onto the classpath.
erickson
Thanks erickson, sadly that goes a bit beyond my immediate Java skills :)
lostlogic
+7  A: 

The standard approach is to use a Map to connect the key strings to the lexers that will handle them:

Map<String,Lexer> lexerMap = new HashMap<String,Lexer>();
lexerMap.put("source1", new Lexer01());
lexerMap.put("source2", new Lexer02());
// and so on

Once you've retrieve the string that identifies the lexer to use, you'd retrieve it from the Map like so:

String grammarId = // read it from a file, whatever
Lexer myLexer = lexerMap.get(grammarId);

Your example code has a few quirks, however. First, the indexOf() calls indicate that you don't have a stand-alone string, and Map won't look inside the string. So you need to have some way to extract the actual key from whatever string you read.

Second, lexers and parsers usually maintain state, so you won't be able to create a single instance and reuse it. That indicates that you need to create a factory class, and store it in the map (this is the Abstract Factory pattern).

If you expect to have lots of different lexers/parsers, then it makes sense to use a map-driven approach. For a small number, an if-else chain is probably your best bet, properly encapsulated (this is the Factory Method pattern).

kdgregory
With references, this is a great answer.
Mark
A: 

hi...

i would change the type of findsite to return a site type (super class) and then leverage the polymorphism... That should be faster than string manipulation...

Do you need separate lexers ?

LB
A: 

Use a Map to configure a site to loadstrategy structure. Then a simple lookup is required based on 'site' and you execute the appropriate strategy. Same can be done for findSite().

Robin
+1  A: 

Replace Conditional With Polymorphism

For a half-measure, for findsite(), you could simply set up a HashMap to get you from site identifier to site. An alternative cleanup would be simply to return the site string, thus:

String findsite(txt) {
  ...................
  if line.indexOf("site1-identifier") 
    return site1;
  if(line.indexOf("site2-identifier")
    return  site2;
  if(line.indexOf("site3-identifier")
    return  site3;
...
}

Using indexOf() in this way isn't really expressive; I'd use equals() or contains().

Carl Manaster
A: 

Could have a map of idenifiers vs sites, then just iterate over the map entries.

// define this as a static somewhere ... build from a properties file
Map<String,String> m = new HashMap<String,String>(){{
    put("site1-identifier","site2");
    put("site2-identifier","site2");
}}

// in your method
for(Map.Entry<String,String> entry : m.entries()){
    if( line.contains(entry.getKey())){
        return line.getValue();
    }
}

cleaner: yes faster: dunno...should be fast enough

Gareth Davis
A: 

You could use reflection possibly

char site = line.charAt(4);
Method lexerMethod = this.getClass().getMethod( "loadLexer" + site, *parameters types here*)
Method parserMethod = this.getClass().getMethod( "loadparser" + site, *parameters types here*)

lexerMethod.invoke(this, *parameters here*);
parserMethod.invoke(this, *parameters here*);
Doomspork
A: 

I was thinking about deriving the lexer/parser spawning into sep. classes and then instantiating them as soon as I get a match

It looks like you have the answer already. That would create code that is more flexible, but not necessary faster.

I guess some benchmarking is in order

Yes, measure with both approaches and take an informed decision. My guess is the way you have it already would be enough.

Perhaps, if what's bothers you is to have a "kilometric" method you could refactor it in different functions with extract method.

The most important thing is to have first a solution that does the job even though it is slow, and once you have it working, profile it and detect points where the performance could be improved. Remember the "Rules of optimization"

OscarRyz
A: 

Suppose your code is inefficient.

Will it take more time than (say) 1% of the time to actually parse the input?

If not, you've got bigger "fish to fry".

Mike Dunlavey
A: 

I don't know about Java but some language allow switch to take strings.

switch(site)
{
    case "site1": loadlexer1; loadparser1; break;
    case "site2": loadlexer2; loadparser2; break;
    ...
}

As for the seconds bit, use a regex to extract the identifier and switch on that. You might be better off using an enum.

BCS