I've been going through a bit of the lemur indexing tutorial here:
http://www.lemurproject.org/tutorials/begin_indexing-1.php
I've created a "corpus" folder, containing one document with the seemingly properly formatted file:
<DOC>
<DOCNO>1</DOCNO>
<TEXT>
Here is some text
</TEXT>
</DOC>
and created the following configuration file:
<parameters>
<corpus>
<path>C:\Users\Tristan\Documents\lemur\corpus</path>
<class>trectext</class>
</corpus>
<memory>256m</memory>
<index>C:\Users\Tristan\Documents\lemur\index</index>
</parameters>
However, when I run:
IndriBuildIndex.exe C:\Users\Tristan\Documents\lemur\config\parameter.xml
I get the cryptic exception:
0:00: Opened repository C:\Users\Tristan\Documents\lemur\index
0:00: Opened C:\Users\Tristan\Documents\lemur\corpus\1
0:00: Error in C:\Users\Tristan\Documents\lemur\corpus\1 : .\src\TaggedDocumentI
terator.cpp(213): Malformed document: C:\Users\Tristan\Documents\lemur\corpus\1
0:00: Closing index
0:00: Finished
I looked at the relevant functions in the source, but nothing in particular jumps out at me. Any ideas?