statistics

Get percentiles of data-set with group by month

Hello, I have a SQL table with a whole load of records that look like this: | Date | Score | + -----------+-------+ | 01/01/2010 | 4 | | 02/01/2010 | 6 | | 03/01/2010 | 10 | ... | 16/03/2010 | 2 | I'm plotting this on a chart, so I get a nice line across the graph indicating score-over-time. Lovely. Now, what...

Does anyone here use the make-cdf & stats.pl program?

I came across this page: Plotting Tools where I found a set of tools with the name stats.pl and make-cdf. I can write my own but don't want to spend too much time when someone else has already done that. Does anyone have these tools or at least point me to a similar set of tools somewhere? ...

simple Stata program

I am trying to write a simple program to combine coefficient and standard error estimates from a set of regression fits. I run, say, 5 regressions, and store the coefficient(s) and standard error(s) of interest into vectors (Stata matrix objects, actually). Then, I need to do the following: Find the mean value of the coefficient estim...

Preparing data for a CDF Plot using Statistics::Descriptive module?

Does anyone know how to prepare data to plot a CDF (I have a bunch of floating point numbers)? I was planning on using gnuplot and on first look, the Statistics::Descriptive module seemed the best fit but looks like I might need some help here. ...

Display statistics from an SVN repository on a web page

What is the best way to get statistics for an entire subversion repository and display some of them on a web page? Example. Total number of commits today, this month etc, most active committer etc. ...

Where should I store user statistics that need to be viewed frequently?

In my web application, my users have many events. One such event is "user updated facebook status." A user could have hundreds of that type of event, and there are 10 types of events. I need to display event counts and other user statistics based on events in a very scalable manner. This is because each user will be able to see his o...

Better algorithm for estimating download time

Possible Duplicate: Estimating/forecasting download completion time We've all seen the download time running estimate that initially says something like "7 days", but keeps dropping wildly (e.g. "23 hours", "45 minutes", "1 min. 50 sec", etc) with each successive estimation as the chunks are downloaded. To avoid these initial...

How to generate correlated binary variables

Dear All: I need to generate a series of N random binary variables with a given correlation function. Let x = {xi} be a series of binary variables (taking the value 0 or 1, i running from 1 to N). The marginal probability is given Pr(xi = 1) = p, and the variables should be correlated in the following way: Corr[ xi xj ] = const |ij| (...

mysql/stats: Weighting an average to accentuate differences from the mean

This is for a new feature on http://cssfingerprint.com (see /about for general info). The feature looks up the sites you've visited in a database of site demographics, and tries to guess what your demographic stats are based on that. All my demgraphics are in 0..1 probability format, not ratios or absolute numbers or the like. Essenti...

Randomized experiments in R

Here is a simple randomized experiment. In the following code I calculate the p-value under the null hypothesis that two different fertilizers applied to tomato plants have no effect in plants yields. The first random sample (x) comes from plants where a standard fertilizer has been used, while an "improved" one has been used in the pl...

aov define F values computation

Greetings to all This is my model: aov.fit<-aov(Y~A+B+C+D+E+A:C+A:E, data=dat) In summary(aov.fit) all F values are computed by eg MS(A)/MS(Residuals). This is not correct (or what I want), except for F(B) and F(A:E). I suppose P values are not correct either. Can I specify how the F computations will be done? I 'd like them to be lik...

Reordering matrix elements to reflect column and row clustering in naiive python

Hello, I'm looking for a way to perform clustering separately on matrix rows and than on its columns, reorder the data in the matrix to reflect the clustering and putting it all together. The clustering problem is easily solvable, so is the dendrogram creation (for example in this blog or in "Programming collective intelligence"). Howev...

UberCart statistics on products added.

I want statistics on the products added to carts, but not checked out. I.e. if a user adds a product to his cart, but doesnt actual pay out, how can I see these products that were added? Or maybe even get notifications everytime a product is added? ...

How to gather usage statistics for iPhone app?

I am in the process of releasing my first iPhone app. It's a simple utility, I'd just like to gauge the release process, app lifetime and trends, just so it can help make more realistic choices in future apps. I think it would be nice to have usage statistics in addition to download stats from Apple. For example, how many times is the a...

Determining the popularity of a video with ratings and views

I am about to embark on a new project - a video website. Users will be able to register, and vote on videos by clicking "like" or "dislike", or something to that effect. In any event, it will be a 2-option voting system, not a 5-star system. Every X number of days, I will be generating a "chart" of the most popular videos. So my questi...

Is there a Pair-Wise PostHoc Comparisons for the Chi-Square Test in R?

Hi all, I am wondering if there exists in R a package/function to perform the: "Post Hoc Pair-Wise Comparisons for the Chi-Square Test of Homogeneity of Proportions" (or an equivalent of it) Which is described here: http://epm.sagepub.com/cgi/content/abstract/53/4/951 My situation is of just making a chi test, on a 2 by X matrix. I fo...

Howto Plot "Reverse" Cumulative Frequency Graph With ECDF

I have no problem plotting the following cumulative frequency graph plot like this. library(Hmisc) pre.test <- rnorm(100,50,10) post.test <- rnorm(100,55,10) x <- c(pre.test, post.test) g <- c(rep('Pre',length(pre.test)),rep('Post',length(post.test))) Ecdf(x, group=g, what="f", xlab='Test Results', label.cu...

Problem loading R own created libraries in Java/JRI code

I created my own new R library (called "Media"). There is no problem when I try to load it with RGui, and I can call the functions defined in the new package. This is how I load it: > library(Media) But, I'm also trying to call that functions from Java/JRI code, and when I load the new R package, Java doesn't seem to find the paca...

Estimate gaussian (mixture) density from a set of weighted samples

Assume I have a set of weighted samples, where each samples has a corresponding weight between 0 and 1. I'd like to estimate the parameters of a gaussian mixture distribution that is biased towards the samples with higher weight. In the usual non-weighted case gaussian mixture estimation is done via the EM algorithm. Does anyone know an ...

How can I get the decreasing cumulative of frequency series using Perl?

I have a data that looks like this: 3 2 1 5 What I want to get is the "decreasing" cumulative of this data yielding 11 8 6 5 0 What is the compact way of doing that in Perl? ...