data-analysis

What's the best interactive Analysis and Plotting Tool for software testing?

My realtime app generates a data log: 100 words of data @10Khz. I need to analyze it and produce some plots of the results. There are intermediate calculations involved - I need to take some differences, averages, etc. Excel would work fine, except for: the 32000 item limit on graph data series is too small - that's only 3 seconds...

What is the object oriented programming computing overhead cost?

I have a large set of data (a data cube of 250,000 X 1,000 doubles, about a 4 gig file) and I want to manipulate it using a previous set of OOP classes I have written in Python. Currently the data set is already so large that to read into my machine memory I have to at least split it in half so computing overhead is a concern. My OOP c...

Unit testing...should it be used here?

Duplicate: http://stackoverflow.com/questions/135651/learning-unit-testing I'm trying to develop some software for my research group to analyze and plot experimental data. I would like to make it were it's pretty error free. Would this be a situation for unit testing? If so could you possibly point me to some good references for un...

Python, ROOT, and MINUIT integration?

I'm a modest graduate student in a high energy particle physics department. With an unfounded distaste for C/C++ and a founded love of python, I have resorted to python for my data analysis so far (just the easy stuff) and am about to attempt backing python scripts against ROOT libraries and, particularly, utilize MINUIT for some paramet...

Detecting and fixing overflows

Hi, we have a particle detector hard-wired to use 16-bit and 8-bit buffers. Every now and then, there are certain [predicted] peaks of particle fluxes passing through it; that's okay. What is not okay is that these fluxes usually reach magnitudes above the capacity of the buffers to store them; thus, overflows occur. On a chart, they loo...

Probability time series, observed data probabilities (deja vu)

okay folks...thanks for looking at this question. I remember doing the following below in college however I forgotten the exact solution. Any takers to steer in the right direction. I have a time series of data (we'll use three) of N. The data series is sequential in order of time (e.g. obsOne[1] occurred along with obsTwo[1] and obs...

What informataion analysis techniques are there for the qualitative analysis of user generated data?

There's a few algorithms we have for sorting data, finding the maximum and minimum, finding the shortest path between nodes etc. I've started looking into the qualitative analysis of user-generated data and have come across latent semantic anaylsis. What other techniques exists for the analysis of textual data ... and possibly other med...

How do I implement a real time *financial* statistics engine from SQL server data for dashboard display?

We currently use excel automation to calculate time series statistics and store the results in our SQL Server 2008 database for easy display/sorting/etc. later. I'm currently redesigning the home screen of our app to present the most important information (as identified by the team using the app) in dashboard form. I'd like the display...

Data analysis tool like MS excel ...

Hi all I have a large number of data that needs to be compared, We are using Microsoft EXCEL, it costs, and it is slow, besides the graph that it generates is also not up to the mark. Now, is their any other tool, that is free, and has good graph's facility. Thank you. ...

How do you deal with missing data using numpy/scipy?

One of the things I deal with most in data cleaning is missing values. R deals with this well using its "NA" missing data label. In python, it appears that I'll have to deal with masked arrays which seem to be a major pain to set up and don't seem to be well documented. Any suggestions on making this process easier in Python? This is bec...

What's the best approach to recognize patterns in data, and what's the best way to learn more on the topic?

A developer I am working with is developing a program that analyzes images of pavement to find cracks in the pavement. For every crack his program finds, it produces an entry in a file that tells me which pixels make up that particular crack. There are two problems with his software though: 1) It produces several false positives 2) If ...

How to get information form link clicks?

I am wondering how it is possible to get information from link clicks. For example, a user is logged in and clicks a link. Is it possible to record that information? Number of links clicked, which ones, etc... things like that. I have no idea how to do this. Any ideas / links to information? ...

Find Lines in a cloud of points

Hi I have an array of Points. I KNOW that these points represent many lines in my page. How can I find them? Do I need to find the spacing between clouds of points? Thanks Jonathan ...

How can I get the (x,y) values of the line that is ploted by a contour plot (matplotlib)?

Hello everybody, Is there an easy way to get the (x,y) values of a contour line that was plotted like this: import matplotlib.pyplot as plt x = [1,2,3,4] y = [1,2,3,4] m = [[15,14,13,12],[14,12,10,8],[13,10,7,4],[12,8,4,0]] cs = plt.contour(x,y,m, [9.5]) plt.show() Cheers, Philipp ...

Getting the contents of a library interactively in R

Is there an equivalent of dir function (python) in R? When I load a library in R like - library(vrtest) I want to know all the functions that are in that library. In Python, dir(vrtest) would be a list of all attributes of vrtest. I guess in general, I am looking for the best way to get help on R while running it in ESS on linux...

Displaying access log analysis

I'm doing some work to analyse the access logs from a Catalyst web application. The data is from the load balancers in front of the web farm and totals about 35Gb per day. It's stored in a Hadoop HDFS filesystem and I use MapReduce (via Dumbo, which is great) to crunch the numbers. The purpose of the analysis is try to establish a usage...

Decoding File Format (Norton .FBF)

Hey there, I currently have a norton ghost backup of the 'My Documents' folder, however norton ghost does not allow me to restore 'all' my files at once, and only has the functionality to search for specific files and restore them. This is a problem as I have nearly 100GB of important documents and such, that are locked away in these .f...

Matplotlib: Formatting dates on the x-axis in a 3D Bar graph

Given this 3D bar graph sample code, how would you convert the numerical data in the x-axis to formatted date/time strings? I've attempted using the ax.xaxis_date() function without success. I also tried using plot_date(), which doesn't appear to work for 3D bar graphs. Here is a modified version of the sample code to illustrate what I a...

Workflow for developing number crunching applications on amazon ec2/S3

Much has been written about deploying data crunching applications on EC2/S3, but I would like to know, what is the typical workflow for developing such applications? Lets say I have a 1 TB of time series data to begin with and I have managed to store this on S3. How would I write applications and do interactive data analysis to build m...

Efficient way to analyze large amounts of data?

I need to analyze tens of thousands of lines of data. The data is imported from a text file. Each line of data has eight variables. Currently, I use a class to define the data structure. As I read through the text file, I store each line object in a generic list, List. I am wondering if I should switch to using a relational database (SQ...