views:

92

answers:

1

my stanford tagger ran out of memory. Is it because the text has to be properly formatted? This is because i use it to tag html contents, with the tags stripped, but there may have quite a excessive amount of newlines.

here is the error:

BlockquoWARNING: Untokenizable: ? (char in decimal: 9829) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequenceNew(Ex actBestSequenceFinder.java:175) at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(Exact BestSequenceFinder.java:98) at edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSente nce.java:277) at edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSent ence.java:258) at edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence. java:110) at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger. java:825) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1319) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1225) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1183) at edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:13 58)

A: 

sorry guys, done. I've changed the settings to 500m.

goh