views:

95

answers:

2

I'm currently using POI to attempt to extract text out of a batch of Word documents and I need to be able to determine what entries a document contains. I've been able to get as far as pulling the document root and pulling the first entry but I want to be able to view all entries. the getEntries() method seems to provide this functionality but I'm at a loss as to how to use getViewableIterator() to pull them out. Below is what I have code-wise:

<cfset myFile = createObject("java", "java.io.FileInputStream").init(fileInputPath)>
<cfset fileSystem = CreateObject( "java", "org.apache.poi.poifs.filesystem.POIFSFileSystem" ).Init(myFile)>

<cfloop from="1" to="#fileSystem.getRoot().getEntryCount()#" index="i">
     <cfset viewableIterator = fileSystem.getRoot().getEntries().next().getViewableIterator()>
     <cfset nextEntry = fileSystem.getRoot().getEntries().next()>
     <cfif viewableIterator.hasNext()>
         <cfdump var="#nextEntry.getShortDescription()#">
         <cfset viewableIterator.remove()>
     </cfif>
</cfloop>

On the first loop, I'm able to get the first entry just fine. However, I get an java.lang.IllegalStateException error as soon as remove() is executed. Obviously I'm not using the remove() method correctly but I haven't been able to find any examples of how this should be properly used. Any help would be greatly appreciated.

Thanks, --Anne

A: 

I don't really understand your XML tags (usually I use Java in its normal form, with curly braces and stuff), but generally a Java iterator works like the following:

while(iterator.hasNext()) {
  x = iterator.next(); // get element
  // do with x what you want
  if (/*you want to remove x from the underlying list*/)
      iterator.remove();
}

In practice, remove is only used very rarely, in cases you want to go through a collection and remove everything you do not need any longer in it. remove can fail if the collecion is readonly or if you are trying to iterate over it twice with two different iterators at the same time. Just stick with hasNext and next.

mihi
Ok, to make sure I understand this correctly, on each call of x within the loop, the iterator should automatically move to the next element in the hash?(By the way, the XML you're seeing is not actually XML, it's ColdFusion markup)
Anne Porosoff
Yes, the iterator will move to the next element whenever you call next();
mihi
off topic question: Can I see who downvoted this answer although it was accepted so that I can ask him why?
mihi
A: 

Ben Nadel of Kinky Solutions fame wrote a component that might handle your situation. Give a look see and report back if his project helped you.

POI Utility ColdFusion Component

rip747
I briefly looked at Ben Nadel's component but had largely written it off because it was written for Excel files whereas my issue was specific to Word. Nevertheless, I was eventually able to figure out a workaround on my own.
Anne Porosoff