large-files

Splitting a file and its lines under Linux/bash

I have a rather large file (150 million lines of 10 chars). I need to split it in 150 files of 2 million lines, with each output line being alternatively the first 5 characters or the last 5 characters of the source line. I could do this in Perl rather quickly, but I was wondering if there was an easy solution using bash. Any ideas? ...

Is there a distributed VCS that can manage large files?

Is there a distributed version control system (git, bazaar, mercurial, darcs etc.) that can handle files larger than available RAM? I need to be able to commit large binary files (i.e. datasets, source video/images, archives), but I don't need to be able to diff them, just be able to commit and then update when the file changes. I last...

Nuking huge file in svn repository

As the local subversion czar i explain to everyone to keep only source code and non-huge text files in the repository, not huge binary data files. Smaller binary files that are parts of tests, maybe. Unfortunately i work with humans! Someone is likely to someday accidentally commit a 800MB binary hulk. This slows down repository o...

Advice on handling large data volumes.

So I have a "large" number of "very large" ASCII files of numerical data (gigabytes altogether), and my program will need to process the entirety of it sequentially at least once. Any advice on storing/loading the data? I've thought of converting the files to binary to make them smaller and for faster loading. Should I load everything...

Best Free Text Editor Supporting *More Than* 4G Files?

I am looking for a text editor that will be able to load a 4+ Gigabyte file into it. Textpad doesn't work. I own a copy of it and have been to its support site, it just doesn't do it. Maybe I need new hardware, but that's a different question. The editor needs to be free OR, if its going to cost me, then no more than $30. For Windo...

Text editor to open big (giant, huge, large) text files

I mean 100+ MB big; such text files can push the envelope of editors. I need to look through a large XML file, but cannot if the editor is buggy. Any suggestions? ...

Reading very large files in PHP

fopen is failing when I try to read in a very moderately sized file in PHP. A 6 meg file makes it choke, though smaller files around 100k are just fine. i've read that it is sometimes necessary to recompile PHP with the -D_FILE_OFFSET_BITS=64 flag in order to read files over 20 gigs or something ridiculous, but shouldn't I have no prob...

Graphical open source text editor for large text files (> 200 MBytes)

Is there an open source alternative (similar to ultraedit) to handle files with filesize >200 MBytes? ...

What's the best way to sync large amounts of data around the world?

I have a great deal of data to keep synchronized over 4 or 5 sites around the world, around half a terabyte at each site. This changes (either adds or changes) by around 1.4 Gigabytes per day, and the data can change at any of the four sites. A large percentage (30%) of the data is duplicate packages (Perhaps packaged-up JDKs), so the s...

Processing large (over 1 Gig) files in PHP using stream_filter_*

$fp_src=fopen('file','r'); $filter = stream_filter_prepend($fp_src, 'convert.iconv.ISO-8859-1/UTF-8'); while(fread($fp_src,4096)){ ++$count; if($count%1000==0) print ftell($fp_src)."\n"; } When I run this the script ends up consuming ~ 200 MB of RAM after going through just 35MB of the file. Running it without the stream_f...

Fastest possible XML handling in Delphi for very large documents

I need recommendations on what to use in Delphi (I use Delphi 2009) to handle very large XML files (e.g. 100 MB) as fast as possible. I need to input the XML, access and update the data in it from my program, and then export the modified XML again. Hopefully the input and output could be done within a few seconds on a fast Windows mac...

The best way to read larges files in PHP?

Hi, I have to read CSV files line by line wich can be 10 to 20 Meg. file() is useless ;-) and I have to find the quickiest way. I have try with fgets(), wich run fine, but I don't know if it read a small block each time I call it, or if it cache a bigger one and optimize file I/O. Do I have to try the fread() way, parsing EOL by mysel...

When handling large file transfers in ASP.NET what precautions should you take?

My ASP.NET application allows users to upload and download large files. Both procedures involve reading and writing filestreams. What should I do to ensure the application doesn't hang or crash when it handles a large file? Should the file operations be handled on a worker thread for example? ...

Upload large files in .NET

I've done a good bit of research to find an upload component for .NET that I can use to upload large files, has a progress bar, and can resume the upload of large files. I've come across some components like AjaxUploader, SlickUpload, and PowUpload, to name a few. Each of these options cost money and only PowUpload does the resumable u...

How can you concatenate two huge files with very little spare disk space?

Suppose that you have two huge files (several GB) that you want to concatenate together, but that you have very little spare disk space (let's say a couple hundred MB). That is, given file1 and file2, you want to end up with a single file which is the result of concatenating file1 and file2 together byte-for-byte, and delete the origina...

How can I read lines from the end of file in Perl?

Hi Guys, I am working on a Perl script to read CSV file and do some calculations. CSV file has only two columns, something like below. One Two 1.00 44.000 3.00 55.000 Now this CSV file is very big ,can be from 10 MB to 2GB. Currently I am taking CSV file of size 700 MB. I tried to open this file in notepad, excel but it looks like ...

What is the most efficient way of extracting information from a large number of xml files in python?

Hi, I have a directory full (~103, 104) of XML files from which I need to extract the contents of several fields. I've tested different xml parsers, and since I don't need to validate the contents (expensive) I was thinking of simply using xml.parsers.expat (the fastest one) to go through the files, one by one to extract the data. I...

IOException reading a large file from a UNC path into a byte array using .NET

I am using the following code to attempt to read a large file (280Mb) into a byte array from a UNC path public void ReadWholeArray(string fileName, byte[] data) { int offset = 0; int remaining = data.Length; log.Debug("ReadWholeArray"); FileStream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read); ...

Best file system to transfer 5+GB files between OS X and Windows on removable media

I need to transfer DVD image files between a Windows XP computer and a Mac running Leopard. The machines are not connected via a fast network, and I have a few USB drives floating around that I want to use, e.g. 8GB flash, 60GB and 250GB USB hard drives. Sometimes the files creep above 4GB (the maximum size of a single file on FAT32), ...

How best to use XPath with very large XML files in .NET?

I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The problem I have is that the standard way I would normally do this through the System.XML libraries likes to load the whole file into memory before it does anything with it...