I have a design question. I have a file that is several GB (between 3-4 GB). The file is ordered by time stamp. I am trying to figure out what the best way is to deal with this file
I was thinking of reading this whole file into memory, then transmitting this data to different machines and then running my analysis on those machines.
Would it be wise to upload this into a database before running my analysis ?
Just so you know, I plan to run my analysis on different machines, so doing it through database would be easier but if I increase the number machines to run my analysis on the database might get too slow.
Any ideas?
@update :
I want to process the records one by one. Basically trying to run a model on a timestamp data but I have various models so want to distribute it so that this whole process run over night every day. I want to make sure that I can easily increase the number of models and not decrease the system performance. Which is why I am planning to distributing data to all the machines running the model ( each machine will run a single model).