I will be writing a little Python script tomorrow, to retrieve all the data from an old MS Access database into a CSV file first, and then after some data cleansing, munging etc, I will import the data into a mySQL database on Linux.
I intend to use pyodbc to make a connection to the MS Access db. I will be running the initial script in a Windows environment.
The db has IIRC well over half a million rows of data. My questions are:
- Is the number of records a cause for concern? (i.e. Will I hit some limits)?
- Is there a better file format for the transitory data (instead of CSV)?
I chose CSv because it is quite simple and straightforward (and I am a Python newbie) - but I would like to hear from someone who may have done something similar before.