I have an excel sheet which has about 150,000 records, operations like find replace, delete columns etc are taking a lot of time. I need to write a script to perform some tasks like find and replace, sort, delete rows/columns etc. Because the excel sheet is too big, tasks like these take lots of time. What format should I convert my excel sheet so that processing time for such tasks becomes shorter, and so that I could create a script to perform the tasks>
views:
244answers:
5Hi
Plain text, awk and sed are your friends
Regards
Mark
I believe you can output Excel files as XML. If you already have access to libraries that can manipulate XML structures then it would likely be easy to do. Worst case scenario would be to convert it to a CSV file and do some raw text manipulation (likely would be slower though). Unless of course you mean scripting/macroing within Excel, in which case you're probably out of luck. I'm not sure about database conversion, which would probably be optimal with that many records, perhaps someone else can help you there.
You could always load it into a SQLite database. If you're doing lots of find-replacing that'd be pretty quick. It's difficult to give a more useful answer without knowing a bit more about your data though, and how often you'll need to do things with it in Excel.
You could write a bit of Python to get the data out of Excel and into SQLite (and back again) using pyExcelerator and the sqlite3 module.
When you say scripting, what language and platform are we talking?
Without knowing the details, I'd recommend importing spreadsheets into a SQL Server (or even Access) database and exporting the transformed query results back into a spreadsheet. I've had good experiences with that method, although my data sets have been usually even larger than 150k rows, and with relatively few long text fields.
Export it to a database and keep it there, 150,000 rows is too much for Excel to deal with manfully, dam excel 2007!