questions about large-files | ansaurus

large-files

gcc/g++: error when compiling large file

Hi, I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed. When I try to compile this file, g++ exits and says that it couldn't reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command ...

What are the most efficient idioms for streaming data from disk with constant space usage?

Problem Description I need to stream large files from disk. Assume the files are larger than will fit in memory. Furthermore, suppose that I'm doing some calculation on the data and the result is small enough to fit in memory. As a hypothetical example, suppose I need to calculate an md5sum of a 200GB file and I need to do so with gu...

How to create Large resumable download from a secured location .NET

I need to preface I'm not a .NET coder at all, but to get partial functionality, I modified a technet chunkedfilefetch.aspx script that uses chunked Data Reading and writing Streamed method of doing file transfer, to get me half-way. iStream = New System.IO.FileStream(path, System.IO.FileMode.Open, _ IO.FileAccess.Read, IO.FileShar...

resume-download

Remote linux server to remote linux server large sparse files copy - How To?

I have two twins CentOS 5.4 servers with VMware Server installed on each. What is the most reliable and fast method for copying virtual machines files from one server to the other, assuming that I always use sparse file for my vmware virtual machines? The vm's files are a pain to copy since they are very large (50 GB) but since they ...

virtual-machine

Python: How to read huge text file into memory

I'm using Python 2.6 on a Mac Mini with 1GB RAM. I want to read in a huge text file $ ls -l links.csv; file links.csv; tail links.csv -rw-r--r-- 1 user user 469904280 30 Nov 22:42 links.csv links.csv: ASCII text, with CRLF line terminators 4757187,59883 4757187,99822 4757187,66546 4757187,638452 4757187,4627959 4757187,312826 475718...

How can I quickly parse large (>10GB) files?

Hi - I have to process text files 10-20GB in size of the format: field1 field2 field3 field4 field5 I would like to parse the data from each line of field2 into one of several files; the file this gets pushed into is determined line-by-line by the value in field4. There are 25 different possible values in field2 and hence 25 different f...

Dealing with large files in Haskell

I have a large file (4+ gigs) of, lets just say, 4 byte floats. I would like to treat it as List, in the sense that I would like to be able to use map, filter, foldl, etc. However, instead of producing a new list with the output, I would like to write the output back into the file, and thus only have to load a small portion of the file i...

large-data-volumes

Read from one large file and write to many (tens, hundreds, or thousands) files in Java?

I have a large-ish file (4-5 GB compressed) of small messages that I wish to parse into approximately 6,000 files by message type. Messages are small; anywhere from 5 to 50 bytes depending on the type. Each message starts with a fixed-size type field (a 6-byte key). If I read a message of type '000001', I want to write append its payloa...

large-data-volumes

Python halts while iteratively processing my 1GB csv file

I have two files: metadata.csv: contains an ID, followed by vendor name, a filename, etc hashes.csv: contains an ID, followed by a hash The ID is essentially a foreign key of sorts, relating file metadata to its hash. I wrote this script to quickly extract out all hashes associated with a particular vendor. It craps out before it fin...

Editing large data files

I'm about to start on a project wherein I can foresee there being large files (mostly flat text files, but could be CSV, fixed-width, XML, ...so far) that need to be edited. I need to develop the pieces to do this editing within the application. In trying to determine a Good Way to handle editing large amounts of data (possibly into th...

Remote java program execution using ftp, very large dataset on remote machine - program to data vs data to program

Hi all, I am developing a java based application; its pertinent requirements are listed below Large datasets exist on several machines on network. my program needs to (remotely) execute a java program to process these data sets and fetch the results A user on a windows desktop will need to process datasets (several gigs) on machine A....

remote-execution

Seamlessly use large background images on webpages

I want to have huge background images on my site but without giving the user a hard time downloading them and the site looking ugly as the background loads. They would be no bigger than 1920 X 1080 in size, however it's hard to say in terms of kilobytes/megabytes. What are my options here and which are most effective? I'm not too both...

background-image

Fast 'C' library to tranparently manage very large files

I need to save very large amounts of data (>500GB) which is being streamed (800Mb/s) from another device connected to my PC. The speed rules out use of a database e.g. MySQl/ISAM and I am looking for a fast, light library which sits on top of the 'C' stdio file lib (i.e. fopen/fclose/fwrite) which will allow me to write/read a very larg...

Why is wxTextCtrl so slow at displaying text?

I have a wxTextCtrl and I need to put a very large string into it. (Like a 15 MB string) The only problem is it's very slow. Here is what I'm doing: char * buff = ... wxString data(buff, wxConvUTF8); text->ChangeValue(data); However, this is not the bottleneck. That occurrs as soon as the function this block of code is in returns. The...

How do I read a large file gradually?

I'm having some problems reading a file with java. It is absolutely huge (2,5G) and adjusting my memory doesn't help. The data is all on a single line so I can't read it one line at a time. What I would like to do is to read the file until I find a certain string for example "<|start|>" or "<|end|>" and then print the data in between the...

How should I store a large amout of text data in memory?

I am working on a c parser and wondering how expert manage large amount of text / string (> 100mb) to store in memory? the content is expected to be accessible all the time in fast pace. bg: redhat / gcc / libc a single char array would be out of boundary causing segmentation fault... any idea or experience is welcomed to share / dis...

Reading large text files with streams in C#

Hello again, I've got the lovely task of working out how to handle large files being loaded into our application's script editor (its like VBA for our internal product for quick macros). Most files are about 300-400Kb which is fine loading. But when they go beyond 100Mb the process has a hard time as you'd expect. What happens is that ...

Reading large csv files with strings containing commas as one field

Hi, I have a large .csv file (~26000 rows). I want to be able to read it into matlab. Another problem is that it contains a collection of strings delimited by commas in one of the fields. I'm having trouble reading it. I tried stuff like tdfread, which won't work here. Any tricks with textscan i should be aware about? Is there any oth...

How to scan through really huge files on disk?

Considering a really huge file(maybe more than 4GB) on disk,I want to scan through this file and calculate the times of a specific binary pattern occurs. My thought is: Use memory-mapped file(CreateFileMap or boost mapped_file) to load the file to the virtual memory. For each 100MB mapped-memory,create one thread to scan and calculate...

memory-mapped-files

populating mysql database

I have a file with over a million lines of data, each line is a record. I can go through the file, read the line and do a insert, but this can take up to 2 hours. Is there a faster way like uploading a sql file? ...

1
...
3
4
5
6
7
...
9