ansaurus

Question

is there a limit to the (CSV) filesize that a Python script can read/write?

Answer 1

A:

The only limit should be operating system file size.

That said, make sure when you send the data to the new database, you're writing it a few records at a time; I've seen people do things where they try to load the entire file first, then write it.

Charlie Martin 2010-10-18 23:55:40

Answer 2

+1 A:

I wouldn't bother using an intermediate format. Pulling from Access via ADO and inserting right into MySQL really shouldn't be an issue.

Ignacio Vazquez-Abrams 2010-10-18 23:56:27

oh yeah, doing "some data cleansing, munging" on the fly, no worries, it'll "work" first time. **FAIL**

John Machin 2010-10-19 00:00:23

If it fails directly, then it would have failed with the intermediary regardless.

Ignacio Vazquez-Abrams 2010-10-19 00:04:50

The point is that the multiple attempts at fixing the problems should be better handled with CSV files than in the Access database.

John Machin 2010-10-19 00:31:37

@John: I understand that it's accepted doctrine to do so, and I would have said the same a few years ago, but I can't really think of any specific reason why in this case.

Ignacio Vazquez-Abrams 2010-10-19 00:36:41

Answer 3

+3 A:

Yet another approach if you have Access available ...

Create a table in MySQL to hold the data.

In your Access db, create an ODBC link to the MySQL table.

Then execute a query such as:

INSERT INTO MySqlTable (field1, field2, field3)
SELECT field1, field2, field3
FROM AccessTable;

Note: This suggestion presumes you can do your data cleaning operations in Access before sending the data on to MySQL.

HansUp 2010-10-19 00:02:01

at worst, you can store the data in a table and then clense the data in a separate pass into another table afterwards once it's all in MySQL

TokenMacGuy 2010-10-19 00:58:22

Answer 4

+3 A:

Memory usage for csvfile.reader and csvfile.writer isn't proportional to the number of records, as long as you iterate correctly and don't try to load the whole file into memory. That's one reason the iterator protocol exists. Similarly, csvfile.writer writes directly to disk; it's not limited by available memory. You can process any number of records with these without memory limitations.

For simple data structures, CSV is fine. It's much easier to get fast, incremental access to CSV than more complicated formats like XML (tip: pulldom is painfully slow).

Glenn Maynard 2010-10-19 00:52:50

ansaurus

tags:

views:

answers:

is there a limit to the (CSV) filesize that a Python script can read/write?

related questions