tags:

views:

76

answers:

2

How do I open a file that is an excel file for reading in python? I've opened text files ie sometextfile.txt with the reading command. how do i do that for an excel file?

+3  A: 

This isn't as straightforward as opening a plain text file and will require some sort of external module since nothing is built-in to do this. Here are some options:

http://www.python-excel.org/

If possible, you may want to consider exporting the excel spreadsheet as a CSV file and then using the built-in python csv module to read it:

http://docs.python.org/library/csv.html

orangeoctopus
Ok I don't really understand the CSV stuff how do I have python open up my excel file as a csv module? I have a program that does what I want for txt files and I want it to do the same thing for this excel file...which is the best way to go? Can you elaborate on this please?
novak
Either you can use a 3rd party python module like xlrd, or save your excel file a CSV file, instead of a normal Excel file. I think the point you are missing is that an excel file has no resemblance to a plain text file. Open the Excel document in notepad and you will see what I mean. You either need to save the file in a plain-text format such as CSV (comma-separated values), which is easier to read with python, or install and use a 3rd party module that can parse an Excel file for you.
orangeoctopus
The problem I'm having is the file is really really large. How can I save the file as a CSV format if I cannot completely open the file?
novak
@novak: Your problem is that your file is 1.5GB and your computer's memory is "not enough" ...
John Machin
+1  A: 

Try the xlrd library.

[Edit] - from what I can see from your comment, something like the snippet below might do the trick. I'm assuming here that you're just searching one column for the word 'john', but you could add more or make this into a more generic function.

from xlrd import open_workbook

book = open_workbook('simple.xls',on_demand=True)
for name in book.sheet_names():
    if name.endswith('2'):
        sheet = book.sheet_by_name(name)

        # Attempt to find a matching row (search the first column for 'john')
        rowIndex = -1
        for cell in sheet.col(0): # 
            if 'john' in cell.value:
                break

        # If we found the row, print it
        if row != -1:
            cells = sheet.row(row)
            for cell in cells:
                print cell.value

        book.unload_sheet(name) 
Jon Cage
I think this might be what I want it to do :from xlrd import open_workbookbook = open_workbook('simple.xls',on_demand=True)for name in book.sheet_names():if name.endswith('2'):sheet = book.sheet_by_name(name)print sheet.cell_value(0,0)book.unload_sheet(name)large_files.pybut I dont want it to use endwith i want it to find and print lines that contain a particlar name...like i want it to print the line of the huge excel sheet that contains john's data and not bob's. help?
novak
I'd suggest you post this as a seperate question and put the code in a code block.
Jon Cage
@JonCage: This is the second question of a series of related questions; in the 3rd question it is revealed that the real excel file is allegedly 1.5 GB and the computer's memory is described as "not enough" ... see http://stackoverflow.com/questions/3241039/how-do-i-extract-specific-lines-of-data-from-a-huge-excel-sheet-using-python
John Machin