views:

72

answers:

2

I am writing a program to produce random records in a format that can be specified in code and optionally write it to disk as a text file so it can be used for datamining benchmarks.

My problem is that I can verify that my program works with small text files but I need to know if this is true for large amounts of data (this program will be used to create files with billions and even trillions of rows). But notepad and notepad++ both cannot open and correctly display text files this large (notepad++ has a 2714502 line limit, i don't know notepad's).

Just so you know, in the record format I am testing this with 100,000,000 rows of data = about a 9gb file. I am aiming at eventually producing files over 1tb in length.

+1  A: 

If it is in a format that a database can import (for example comma or tab delimited) then you can bulk load the file into a database and then check there that all rows have been created

I bulk load files that are billions of rows and in the hundreds of TBs on a regular basis

SQLMenace
+4  A: 

Opening and viewing a document with billions of rows for the purpose of verifying sounds pretty futile (that's a lot of records to read through).

Write a program that reads one record at a time from your huge file and verify it, alternativly have that program write the records out to multiple small files.

nos
I like write multiple smaller files idea.
TheSean
I don't understand the need to open big files in an editor.
Leonardo Herrera
Your right it is pretty futile. I had wanted to verify that all of the rows (or at least some really far down into the file) were created correctly. I guess the best way to do that would be to run the benchmarks and look for really obvious errors.
Jeff