I am using the Stanford Natural Language processing toolkit. I've been trying to find spelling errors with Lexicon
's isKnown
method, but it produces quite a few false positives. So I thought I'd load a second lexicon, and check that too. However, that causes a problem.
private static LexicalizedParser lp = new LexicalizedParser(Constants.stdLexFile);
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);
static {
lp.setOptionFlags(Constants.lexOptionFlags);
wsjLexParse.setOptionFlags(Constants.lexOptionFlags);
}
public ParseTree(String input) throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
initialInput = input;
DocumentPreprocessor process = new DocumentPreprocessor();
sentences = process.getSentencesFromText(new StringReader(input));
for (List<? extends HasWord> sent : sentences) {
if(lp.parse(sent)) { // line 65
forest.add(lp.getBestParse()); //non determinism?
}
}
partsOfSpeech = pos();
runAnalysis();
}
The following fail trace is produced:
java.lang.ArrayIndexOutOfBoundsException: 45547
at edu.stanford.nlp.parser.lexparser.BaseLexicon.initRulesWithWord(BaseLexicon.java:300)
at edu.stanford.nlp.parser.lexparser.BaseLexicon.isKnown(BaseLexicon.java:160)
at edu.stanford.nlp.parser.lexparser.BaseLexicon.ruleIteratorByWord(BaseLexicon.java:212)
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.initializeChart(ExhaustivePCFGParser.java:1299)
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.parse(ExhaustivePCFGParser.java:388)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:234)
at nth.compling.ParseTree.<init>(ParseTree.java:65)
at nth.compling.ParseTreeTest.constructor(ParseTreeTest.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.junit.internal.runners.BeforeAndAfterRunner.invokeMethod(BeforeAndAfterRunner.java:74)
at org.junit.internal.runners.BeforeAndAfterRunner.runBefores(BeforeAndAfterRunner.java:50)
at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:33)
at org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
If I comment out this line: (and other references to wsjLexParse)
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);
then everything works fine. What am I doing wrong here?