views:

1513

answers:

7

Hi, I'm using HSSF-POI for reading excel data. The problem is I have values in a cell that look like a number but really are strings. If I look at the format cell in Excel, it says the type is "text". Still the HSSF Cell thinks it's numeric. How can I get the value as a string?

If I try to use cell.getRichStringValue, I get exception; if cell.toString, it's not the exact same value as in Excel sheet.

Thanks!

edit: until this gets resolved, I'll use new BigDecimal(cell.getNumericCellValue()).toString()

+2  A: 

You mean HSSF-POI says cell.getCellType() == Cell.CELL_TYPE_NUMERIC, not Cell.CELL_TYPE_STRING as it should be?

I would think it's a bug in POI, but every cell contains a Variant, and Variant has a type. It's kind of hard to make a bug there, so instead I think Excel uses some extra data or heuristic to report the field as text. Usual MS way, alas.

P.S. You cannot use any getString() on a Variant containing numeric, as the binary representation of the Variant data depends on it's type, and trying to get a string from what is actually a number would result in garbage -- hence the exception.

Vladimir Dyuzhev
Yes, POI says it's Cell.CELL_TYPE_NUMERIC. I have to investigate this further when I get back to work. I didn't realize it could be just some heuristic in _Excel_. (I just quickly glanced the data, I didn't create the spreadsheet myself.) Thanks for your comment.
egaga
A heuristic at work seems about right. Just today I had a little fight with excel to have it stop happily changing my data from numbers to dates in the "appropriate" locale.
NomeN
Thanks. I use return new BigDecimal(cell.getNumericCellValue()).toString() for now.
egaga
+1  A: 

Excel will convert anything that looks like a number or date or time from a string. See MS Knowledge base article, which basically suggests to enter the number with an extra character that makes it a string.

Mark
+1 for pointing out an official confirmation/explanation regarding the already suspected heuristic of Excel.
Steffen Opel
+1  A: 

You are probably dealing with an Excel problem. When you create the spreadsheet, the default cell type is Generic. With this type, Excel guesses the type based on the input and this type is saved with each cell.

When you later change the cell format to Text, you are just changing the default. Excel doesn't change every cell's type automatically. I haven't found a way to do this automatically.

To confirm this, you can go to Excel and retype one of the numbers and see if it's text in HSSF.

You can also look at the real cell type by using this function,

  @Cell("type", A1)

A1 is the cell for the number. It shows "l" for text, "v" for numbers.

ZZ Coder
A: 

If the documents you are parsing are always in a specific layout, you can change the cell type to "string" on the fly and then retrieve the value. For example, if column 2 should always be string data, set its cell type to string and then read it with the string-type get methods.

cell.setCellType(Cell.CELL_TYPE_STRING);

In my testing, changing the cell type did not modify the contents of the cell, but did allow it to be retrieved with either of the following approaches:

cell.getStringCellValue();

cell.getRichStringCellValue().getString();

Without an example of a value that is not converting properly, it is difficult to know if this will behave any differently than the cell.toString() approach you described in the description.

jt
+1  A: 

The problem with Excel is that the default format is generic. With this format Excel stores numbers entered in the cell as numeric. You have to change the format to text before entering the values. Reentering the values after changing the format will also work.
That will lead to little green triangles in the left upper corner of the cells if the content looks like a number to Excel. If this is the case the value is really stored as text.

With new BigDecimal(cell.getNumericCellValue()).toString() you will still have a lot of problems. For example if you have identifying numbers (e.g. part numbers or classification numbers) you probably have cases that have leading zeros which will be a problem with the getNumericCellValue() approach.

I try to thoroughly explain how to correctly create the Excel to the party creating the files I have to handle with POI. If the files are uploaded by end users I even have created a validation program to check for expected cell types if I know the columns in advance. As a by-product you can also check various other things of the supplied files (e.g. are the right columns provided or mandatory values).

Turismo
This seems to address my problem. Thank you.
egaga
A: 

"The problem is I have values in a cell that look like a number" => look like number when viewed in Excel?

"but really are strings" => what does that mean? How do you KNOW that they really are strings?

"If I look at the format cell" => what's "the format cell"???

'... in Excel, it says the type is "text"' => Please explain.

"Still the HSSF Cell thinks it's numeric." => do you mean that the_cell.getCellType() returns Cell.CELL_TYPE_NUMERIC?

"How can I get the value as a string?" => if it's NUMERIC, get the numeric value using the_cell.getNumericCellValue(), then format it as a string any way you want to.

"If I try to use cell.getRichStringValue, I get exception;" => so it's not a string.

"if cell.toString, it's not the exact same value as in Excel sheet." => so cell.toString() doesn't format it the way that Excel formats it.

Whatever heuristic Excel uses to determine type is irrelevant to you. It's the RESULT of that decision as stored in the file and revealed by getCellType() that matters.

John Machin
I mean that they are semantically strings; their representation matters (01 differs from 1), unlike in numbers (01 is the same as 1)I mean the "format cells" function when I right click on a cell, "text" is selected as the type.
egaga
A: 

I have a similar problem as mention. The string value "01" is returned as "1" by POI. Anyone has the solution.

Thanks.

kkangsar