views:

83

answers:

8

Hello,

I currently work in the position of Data Warehouse programmer and as such have to put numerous flat files through ETL process. Of course prior to loading the file I have to be aware of its content, the problem is that majority of the files are > 1 GB large and I can not open them using my dear old friend "notepad". Kidding. I usually use VIM or Notepad++ but it still takes a while to open the file. Could I perform a "partial" read of the file using VIM or some other editor?

P.S. I know that I could write a 10 liner script to "data sample" the file, but it would be simpler to convince team members to use a feature of an editor than a script that I wrote.

Thank you for any insight you might have.

+3  A: 

If you want to stick with using vim, you could have a look at the LargeFile script.

Alternatively, I've always found that UltraEdit opens large files extremely quickly.

Chad Birch
That's a great suggestion! Thank you!
a_person
+3  A: 

You said you had VIM, that makes me wonder if you have a unix environment as well?

If you like, you can pipe the input through unix utility top and display the raw imput on your screen. Like this:

EDIT: (thanks Honk)

terminal$> head -N 15 file.csv

(Where that 15 indicates you want to see 15 lines only).

rlb.usa
Not sure if `top` is special in mainframe Unixes, but on Linux you would pipe into `head -n 15`.
honk
Or you would even avoid the unnecessary `cat` with `head -n 15 file.csv`. This should be orders of magnitude faster, too.
honk
Thanks, but I am just a big fan of UNIX, our environment is built on MS stack.
a_person
@a_person - being a fan of UNIX but on Microsoft, you might like CYGWIN! This is an off-the-topic suggestion, though. : )
rlb.usa
I definitely appreciate this suggestion and actually have it installed :).
a_person
@a_person +1 for being a CYGWIN fan as well n_n
rlb.usa
+2  A: 

Pretty sure there are loads of similar questions, but hey, Textpad is a good choice for this.

Simon
rlb.usa
TextPad ended up being waaay too slow when tasked with opening the file taking quite a bit longer than Notepad++.
a_person
A: 

UltraEdit claims to handle files over 4GB...

Dave Swersky
+2  A: 

use the head command.

frankc
+1  A: 

Use the 'less' on solaris ... use the same through cygwin on windows. On mainframes this problem doesn't appear, ISPF editor handles it pretty well.

ankur
CYGWIN also handles `less` , and `top`
rlb.usa
A: 

Try PilotEdit, PilotEdit works well for files larger than 1GB. http://www.pilotedit.com

Dracoder
A: 

EmEditor handles huge files like nobody's business. Up to 248GB, according to their website...and more than that, by opening sections of it. So ~1GB shouldn't be an issue.

EJP