tags:

views:

851

answers:

5

Is it possible to split a file? For example you have huge wordlist, I want to split it so that it becomes more than one file. How is this possible?

A: 

Sure, just read in the file and write out some of the words to each different output file. It's possible to do this in any programming language.

David Zaslavsky
A: 

Easily. I'd suggest iterating over the file and writing to a new file as necessary, then deleting the original. This answer is fairly intuitive to me, though, so I'm not sure if it's insufficient, or if perhaps it needs more clarification.

Devin Jeanpierre
+2  A: 

Sure it's possible:

open input file
open output file 1
count = 0
for each line in file:
    write to output file
    count = count + 1
    if count > maxlines:
         close output file
         open next output file
         count = 0
Charlie Martin
Don't forget to reset your count after opening the new file...
Sean Cavanagh
right, or test count mod maxlines.
Charlie Martin
+2  A: 

This one splits a file up by newlines and writes it back out. You can change the delimiter easily. This can also handle uneven amounts as well, if you don't have a multiple of splitLen lines (20 in this example) in your input file.

splitLen = 20         # 20 lines per file
outputBase = 'output' # output.1.txt, output.2.txt, etc.

# This is shorthand and not friendly with memory
# on very large files (Sean Cavanagh), but it works.
input = open('input.txt', 'r').read().split('\n')

at = 1
for lines in range(0, len(input), splitLen):
    # First, get the list slice
    outputData = input[lines:lines+splitLen]

    # Now open the output file, join the new slice with newlines
    # and write it out. Then close the file.
    output = open(outputBase + str(at) + '.txt', 'w')
    output.write('\n'.join(outputData))
    output.close()

    # Increment the counter
    at += 1
sli
Might mention that for REALLY BIG FILES, open().read() chews a lot of memory and time. But mostly it's okay.
Sean Cavanagh
Oh, I know. I just wanted to throw together a working script quickly, and I normally work with small files. I end up with shorthand like that.
sli
+1  A: 

Is this a duplicate? See: http://stackoverflow.com/questions/291740/how-do-i-split-a-huge-text-file-in-python

quamrana