statistics

Calculating the Coefficient of Determination in Python

I'm trying to calculate the coefficient of determination (R^2) in Python, but I'm getting a negative value in certain cases. Is this a sign that there's an error in my calculation? I thought R^2 should be bounded between 0 and 1. Here's my Python code for doing the calculation, adapted straight from the WP article: >>> yi_list = [1, 1,...

Is there a .Net (prefer F# or C#) implementation of the Hilbert-Huang Transform?

Hilbert-Huang Transform, Empirical Mode Decomposition... I have found it implemented in R and Matlab. I'd like to find an open source implementation of it in C#/F#/.NET. ...

Modeling response times

I am trying to model the time lapse between when a user sees an ad and when they call the advertiser (presuming they do). I have two issues - part of the data seems exponential, but I am wondering if there is a similar distribution with an extra parameter because I cannot quite get it to fit. Also, the peak does not occur at t = 0 but ...

Reading 3-dimensional datasets into R

Gnuplot allows for three dimensional datasets, which are a set of tables separated by empty lines, for instance: 54.32,16.17,7.42,4.28,3.09,2.11,1.66,1.22,0.99,0.82,7.9 54.63,15.50,8.53,5.31,3.75,1.66,1.14,0.83,0.94,0.52,7.18 56.49,16.67,6.38,3.69,2.80,1.45,1.12,0.89,1.12,0.89,8.50 56.35,16.26,7.76,3.57,2.62,1.89,1.05,1.15,0.63,1.05,7....

Probabilistic selection from a set

Suppose I want to randomly select a number n between 0 and 30, where the distribution is arbitrary, and not uniform. Each number has a corresponding weight P(n): P(0) = 5, P(1) = 1, P(2) = 30, P(3) = 25, and so on and so forth. How do I do a random selection from this set, such that the probability of selecting a number is proportional t...

Statistical calculations

hi, I have a large dataset with name and values. I want to categorize all these values into a meaningful category: eg: 25% names with certain range of values fall in category 1 50% names with certain range of values fall in category 2. Tried using percentile calculation: but this ends up giving me inconsistent categorization. I was look...

Convert Z-score (Z-value, standard score) to p-value for normal distribution in Python

How does one convert a Z-score from the Z-distribution (standard normal distribution, Gaussian distribution) to a p-value? I have yet to find the magical function in Scipy's stats module to do this, but one must be there. ...

Question about STDIN STDOUT STDERR

I'm designing a MIPS simulator in c++ and my simplified OS must be able to run stat() occasionally (when a program being executed on my simulator requires an input or an output or something.) The problem is, I need to be able to assert STDIN, STDOUT, and STDERR as parameters to stat "stat("stdin",buff)" where buff is the pointer to th...

Conforming results to a scale

I'm monitoring the change in certain values day over day. The changes vary, and can be of any value size, typically 1-100 difference, but maybe there is an outlier at 500 or even 900. I want to be able to put these values on a set scale so I can plot them. Is there a formula I can use to limit the high end of the scale, so no matter...

is there an R function for Stata's xtnbreg?

Have been using STATA to run negative binomial regressions in a replication. Not sure what is under the hood on how STATA does this, but wanted to know if there is an R function/package that does the same thing? The R will give me a better idea of how this works, since I can see the code. ...

Calling rnorm with a vector of means

When I call rnorm passing a single value as mean, it's obvious what happens: a value is generated from Normal(10,1). y <- rnorm(20, mean=10, sd=1) But, I see examples of a whole vector being passed to rnorm (or rcauchy, etc..); in this case, I am not sure what the R machinery really does. For example: a = c(10,22,33,44,5,10,30,22,10...

How ist the bandwith calculated in Weka KernelEstimator class?

I am using Weka to claclculate the probability for a given dataset. More specifically I am using the KernelEstimator class. For good density estimation results the choice of the bandwith parameter is crutial, but i have not been able to find out how the bandwith parameter is calculated. The kernel function being used is a simple Gaussian...

java lib for gathering integration statistics

During last time I made many integrations (data transfer from one place to another, including DB,SSH,FTP,etc), and for each I need to log some statistics (e.g. how many files updated, how many db rows inserted, errors occurred, etc). All this integrations work once per day and until some error occurred, i'm not interested in logs. But wh...

writing the outcome of a nested loop to a vector object in R

Dear Stackers, I have the following data read into R as a data frame named "data_old": yes year month 1 15 2004 5 2 9 2005 6 3 15 2006 3 4 12 2004 5 5 14 2005 1 6 15 2006 7 . . ... . . . ... . I have written a small loop which goes through the data and sums up the yes variable for each ...

Newman's modularity clustering for graphs

Hello, I am interested in running Newman's modularity clustering algorithm on a large graph. If you can point me to a library (or R package, etc) that implements it I would be most grateful. best ~lara ...

Statistical calculations

Hi, Is there any built in library that I can use for calculating median in java?? I was working with apache.commons.math for other statistical functions but median was nowhere to be found. Thank you, ...

Visualisation libraries - AJAX, Flex, Flash, HTML, C/C++

Okay guys - simple question. I have some data in a MySQL database that I want to visualise. Now some methods for doing this are: Axiis GetPivot ManyEyes Are there any others? Max. ...

Is there a library that helps me create nice statistical charts for WPF?

I have to include some sort of reports for my university project and I already have the data ready to be used. I'm thinking of using WPF for the GUI and I was wondering if there was a library or something I could use that has some nice effects for graphs and whatnot. Any suggestions? I have to show information such as total shipments p...

Computation of numerical integral involving convolution

I have to solve the following convolution related numerical integration problem in R or perhaps computer algebra system like Maxima. Integral[({k(y)-l(y)}^2)dy] where k(.) is the pdf of a standard normal distribution l(y)=integral[k(z)*k(z+y)dz] (standard convolution) z and y are scalars The domain of y is -inf to +inf. The integral in ...

Transforming character strings in R

Dear Stackers, I have to merge to data frames in R. The two data frames share a common id variable, the name of the subject. However, the names in one data frame are partly capitalized, while in the other they are in lower cases. Furthermore the names appear in reverse order. Here is a sample from the data frames: DataFrame1$Name: "Van...