views:

674

answers:

2

Hi all,
I'm using apache POI to extract some data from an excel file.
I need an InputStream to instantiate the POI HSSFWorkbook class
HSSFWorkbook wb = new HSSFWorkbook(inputStreamX);

I'm finding differences if I try to construct the InputStream object like

    InputStream inputStream = new FileInputStream(new File("/home/xxx/workspace/myproject/test/resources/importTest.xls"));        
    InputStream inputStream2 = new FileInputStream(getClass().getResource("/importTest.xls").getFile());
    InputStream inputStream3 = new ClassPathResource("importTest.xls").getInputStream();

If I construct the POI object with inputStream it works fine.
But inputStream2 and inputStream3 are throwing this exception

java.io.IOException: Invalid header signature; read -2300849302551019537, expected -2226271756974174256
    at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:100)
    at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:84)

It seems that the header of the binary file is different and the library can't recognize it as an Excel file. I can't understand why.
The only difference I see is that inputStream2 & 3 are using the classloader to locate the file. (ClassPathResource is a Spring class).

I'd like to have the file path separated from the system. So I would prefer something like inputStream2 or 3.

Do you have any idea on why this is happening?

Thank you

Update:
I tried writing to disk the inputStream and inputStream2.
The excel file that comes with inputStream is Ok. inputStream2 contains an excel file with some strange characters that wrap the real content.
It seems that maven corrupts the excel file in some way during the build.
So it's basically the file I retrieve with the classLoader (under /home/xxx/workspace/myproject/target/test-classes/importTest.xls) that is not ok.
Any idea?

A: 

Have you tried ClassLoader#getResourceAsStream(String)? It will probably behave similarly to your second attempt using Class#getResource(String), as alluded to in the latter's documentation.

My first thought here was that no such file was found, but if it's consistently reading the same value (-2300849302551019537) each time you run the program, that suggests there really is a file there that's being read. Trap the statement after you initialize your InputStream and inspect the stream instance in the debugger. You should be able to find a reference to the underlying file name. To make this easier at first, try using ClassLoader#getResources(String) and inspect the sequence of URLs returned.

seh
+1  A: 

The problem seems maven's filtering option.
If the pom looks like this

           <testResource>
       <directory>${basedir}/src/test/resources</directory>
       <includes>
        <include>**/*.xml</include>
        <include>**/*.properties</include>
        <include>**/*.sql</include>
                    <include>**/*.xls</include>
       </includes>
       <filtering>true</filtering>
      </testResource>

When the filtering option is set to true on xls files it corrupts them.

al nik