In my place where I work, used to have files with more than million rows per file. Even though the server memory are more than 10GB with 8GB for JVM, sometimes the server get hanged for few moments and chokes the other tasks.
I profiled the code and found that while file reading memory use rises in Giga bytes frequently(1GB to 3GB) and then suddenly comes back to normal. It seems that this frequent high and low memory uses hangs my servers. Of course this was due to Garbage collection.
Which API should I use to read the files for better performance?
Righ now I am using BufferedReader(new FileReader(...))
to read these CSV files.
Process : How am I reading the file?
- I read files line by line.
- Every line has few columns. based on the types I parse them correspondingly(cost column in double, visit column in int, keyword column in String, etc..).
- I push the eligible content(visit > 0) in a HashMap and finally clears that Map at the end of the task
Update
I do this reading of 30 or 31 files(one month's data) and store the eligible in a Map. Later this map is used to get some culprits in different tables. Therefore reading is must and storing that data is also must. Although I have switched the HashMap part to BerkeleyDB now but the issue at the time of reading file is same or even worse.