views:

426

answers:

2

Hi guys,

I am trying to read in an Excel file using xlrd, and I am wondering if there is a way to ignore the cell formatting used in Excel file, and just import all data as text?

Here is the code I am using for far:

import xlrd

xls_file = 'xltest.xls'
xls_workbook = xlrd.open_workbook(xls_file)
xls_sheet = xls_workbook.sheet_by_index(0)

raw_data = [['']*xls_sheet.ncols for _ in range(xls_sheet.nrows)]
raw_str = ''
feild_delim = ','
text_delim = '"'

for rnum in range(xls_sheet.nrows):
    for cnum in range(xls_sheet.ncols):
        raw_data[rnum][cnum] = str(xls_sheet.cell(rnum,cnum).value)

for rnum in range(len(raw_data)):
    for cnum in range(len(raw_data[rnum])):
        if (cnum == len(raw_data[rnum]) - 1):
            feild_delim = '\n'
        else:
            feild_delim = ','
        raw_str += text_delim + raw_data[rnum][cnum] + text_delim + feild_delim

final_csv = open('FINAL.csv', 'w')
final_csv.write(raw_str)
final_csv.close()

This code is functional, but there are certain fields, such as a zip code, that are imported as numbers, so they have the decimal zero suffix. For example, is there is a zip code of '79854' in the Excel file, it will be imported as '79854.0'.

I have tried finding a solution in this xlrd spec, but was unsuccessful.

+2  A: 

That's because integer values in Excel are imported as floats in Python. Thus, sheet.cell(r,c).value returns a float. Try converting the values to integers but first make sure those values were integers in Excel to begin with:

cell = sheet.cell(r,c)
cell_value = cell.value
if cell.ctype in (2,3) and int(cell_value) == cell_value:
    cell_value = int(cell_value)

It is all in the xlrd spec.

kaloyan
xlrd reports what it finds. The only "integer values" in Excel are floats with a zero fraction part. Excel and its users just don't have the concept of an integer as a separate type. The integers that are contained in some RK cell records in an XLS file are merely artifacts of the serialisation and xlrd correctly converts them to floats.
John Machin
A: 

I know this isn't part of the question, but I would get rid of 'raw_str' and write directly to your csv. For a large file (10,000 rows) this will save loads of time.

You can also get rid of 'raw_data' and just use one for loop.

Josh