large-data-volumes

Advice on handling large data volumes.

So I have a "large" number of "very large" ASCII files of numerical data (gigabytes altogether), and my program will need to process the entirety of it sequentially at least once. Any advice on storing/loading the data? I've thought of converting the files to binary to make them smaller and for faster loading. Should I load everything...

Graphing large amounts of data

In a product I work on, there is an iteration loop which can have anywhere between a few hundred to a few million iterations. Each iteration computes a set of statistic variables (double precision), and the number of variables can be up to 1000 (typically 15-50). As part of the loop, we graph the change in the variables over the iterat...

How to split data over MySQL tables

I have a website with members who message each other. There are getting to be a few members and they like to send messages - I'm sure you can see where this is going. Currently I have said messages stored in a nicely relational table cunningly titled "messages" with different status ids to denote, er, status (unread, saved, etc). I know...

Large MySQL tables

For a web application I'm developing, I need to store a large number of records. Each record will consist of a primary key and a single (short-ish) string value. I expect to have about 100GB storage available and would like to be able to use it all. The records will be inserted, deleted and read frequently and I must use a MySQL databas...

Large primary key: 1+ billion rows mySQL + InnoDB?

I was wondering if InnoDB would be the best way to format the table? The table contains one field, primary key, and the table will get 816k rows a day (est.). This will get very large very quick! I'm working on a file storage way (would this be faster)? The table is going to store ID numbers of Twitter Ids that have already been processe...

how to limit bandwidth used by mysqldump

I have to dump a large database over a network pipe that doesn't have that much bandwidth and other people need to use concurrently. If I try it it soaks up all the bandwidth and latency soars and everyone else gets messed up. I'm aware of the --compress flag to mysqldump which help somewhat. How can I do this without soaking up all th...

How to plot large data vectors accurately at all zoom levels in real time?

I have large data sets (10 Hz data, so 864k points per 24 Hours) which I need to plot in real time. The idea is the user can zoom and pan into highly detailed scatter plots. The data is not very continuous and there are spikes. Since the data set is so large, I can't plot every point each time the plot refreshes. But I also can't jus...

What method of data validation is most appropriate for large data sets

I have a large database and want to implement a feature which would allow a user to do a bulk update of information. The user downloads an excel file, makes the changes and the system accepts the excel file. The user uses a web interface (ASP.NET) to download the data from database to Excel. User modifies the Excel file. Only certain d...

mysql tables structure - one very large table or separate tables?

I'm working on a project which is similar in nature to website visitor analysis. It will be used by 100s of websites with average of 10,000s to 100,000s page views a day each so the data amount will be very large. Should I use a single table with websiteid or a separate table for each website? Making changes to a live service with 100...

Efficiently storing 7.300.000.000 rows

How would you tackle the following storage and retrieval problem? Roughly 2.000.000 rows will be added each day (365 days/year) with the following information per row: id (unique row identifier) entity_id (takes on values between 1 and 2.000.000 inclusive) date_id (incremented with one each day - will take on values between 1 and 3.65...

Efficient MySQL schema with partitioning for huge dataset (7.300.000.000 rows and roughly 80 GB of data)

This is a follow up to my question "Efficiently storing 7.300.000.000 rows" (http://stackoverflow.com/questions/665614/efficiently-storing-7-300-000-000-rows). I've decided to use MySQL with partitioning and the preliminary schema looks like this: CREATE TABLE entity_values ( entity_id MEDIUMINT UNSIGNED DEFAULT 0 NOT NULL, # 3 bytes...

Large dataset (SQL to C#), long load time fix

I have a site I'm building, it's an application that creates mail merges (more or less...) based on a couple of user preferences. It can generate Cartesian joins worth of data without a problem, but in comes the needs of enterprise to make life a bit more difficult... I have to build the application so that, after verifying zip codes ...

What if 2^32 is just not enough?

Hi, what if you have so many entries in a table, that 2^32 is not enough for your auto_increment ID within a given period (day, week, month, ...)? What if the largest datatype MySQL provides is not enough? I'm wondering how should I solve a situation where I'm having so many entries added to my table which require unique ID, but I fill...

Sql query with joins between four tables with millions of rows

We have a transact sql statement that queries 4 tables with millions of rows in each. It takes several minutes, even though it has been optimized with indexes and statistics according to TuningAdvisor. The structure of the query is like: SELECT E.EmployeeName , SUM(M.Amount) AS TotalAmount , SUM(B.Amount) AS BudgetAmount ...

How do I count the number of rows in a large CSV file with Perl?

I have to use Perl on a Windows environment at work, and I need to be able to find out the number of rows that a large csv file contains (about 1.4Gb). Any idea how to do this with minimum waste of resources? Thanks PS This must be done within the Perl script and we're not allowed to install any new modules onto the system. ...

Optimizing MySQL Aggregation Query

Hi All, I've got a very large table (~100Million Records) in MySQL that contains information about files. One of the pieces of information is the modified date of each file. I need to write a query that will count the number of files that fit into specified date ranges. To do that I made a small table that specifies these ranges (all i...

Using PL/SQL, what are good options for sending large amounts of data to client side code?

Using PL/SQL, what are good options for sending large amounts of data to client side code? To elaborate, server side PL/SQL operates on a request and generates a response with a large amount of data that must be sent to the client side code. Are there "good options" for sending down large amounts of data? What types of Oracle pros/cons ...

How many is a "large" data set?

Assumed infinite storage where size/volume/physics (metrics,gigabytes/terrabytes) won't matter only the number of elements and their labels, statistically pattern should emerge already at 30 subsets, but can you agree that less than 1000 subsets is too little to test, and at least 10000 distinct subsets / "elements", "entries" / entities...

C# Charting - Reasonble Large Data Set and Real-time

I'm looking for a C# WinForms charting component, either commercial or open source, that can handle relatively large data sets and be reasonable scalable with regards to chart rendering and updates. The number of data sets to be displayed would be around 30. There would be between 15 and 20 updates per second for each data set. A line ch...

Is it possible to change argv or do I need to create an adjusted copy of it?

My application has potentially a huge number of arguments passed in and I want to avoid the memory of hit duplicating the arguments into a filtered list. I would like to filter them in place but I am pretty sure that messing with argv array itself, or any of the data it points to, is probably not advisable. Any suggestions? ...