statistics

How can avoid people using my code for evil?

I'm not sure if this is quite the right place, but it seems like a decent place to ask. My current job involves manual analysis of large data sets (at several levels, each more refined and done by increasingly experienced analysts). About a year ago, I started developing some utilities to track analyst performance by comparing results a...

Is there a free (or cheap) Matlab equivalent for statistics work?

Hi, I am looking for an easy and not expensive solution to work on a large amount of records coming from sensors and save in a MYSQL database. I would like to do statistics calculation on these records and other heavy calculation. this tool that I am looking for will be used by researchers or engineers who are expert in math and stati...

Likert Rank ordering optimization heuristic possible?

I can't find the type of problem I have and I was wondering if someone knew the type of statistics it involves. I'm not sure it's even a type that can be optimized. I'd like to optimize three variables, or more precisely the combination of 2. The first is a likert scale average the other is the frequency of that item being rated on that...

giving percentages to (overall negative) revenue growth performance

Hey, I have a sampling problem, for an analysis i have to calculate a revenue growth share. I found an article about sampling with negative numbers very helpfull since i have to work with a sum of revenues that is negative. The article suggests to make all the numbers absolute, but unfortunately this does not solve my problem. Since i am...

data collection for statistics: from web to a database

Hi, I'm a statistician by trade and I'd like some recommendations on how to set up a website that can collect data into a database. For personal use, I use Google Forms to collect data, and everything gets populated into a spreadsheet. However, this may not be appropriate in a more professional setting, especially when we have multipl...

Efficient Method for Calculating the Probability of a Set of Outcomes?

Let's say I'm playing 10 different games. For each game, I know the probability of winning, the probability of tying, and the probability of losing (each game has different probabilities). From these values, I can calculate the probability of winning X games, the probability of losing X games, and the probability of tying X games (for ...

compute means of a group by factor

Is there a way that this can be improved, or done more simply? means.by<-function(data,INDEX){ b<-by(data,INDEX,function(d)apply(d,2,mean)) return(structure( t(matrix(unlist(b),nrow=length(b[[1]]))), dimnames=list(names(b),col.names=names(b[[1]])) )) } The idea is the same as a SAS MEANS BY statement. The function 'me...

get statistics about postfix

I have postfix on my server. and my server is sending about 5K emails daily i need to get some statistics about these emails in web interface (web tool) for example how many of them went to each domain (500 to @yahoo, 242 to @gmail and so on) and some other statistics. i need something other than postfix log-watch Thanks ...

MySQL to generate an Age Pyramid

How to write a query suitable for generating an age pyramid like this: I have a table with a DATE field containing their birthday and a BOOL field containing the gender (male = 0, female = 1). Either field can be NULL. I can't seem to work out how to handle the birthdays and put them into groups of 10 years. EDIT: Ideally the X axis...

Using LINQ to create an IEnumerable<> of delta values

I've got a list of timestamps (in ticks), and from this list I'd like to create another one that represents the delta time between entries. Let's just say, for example, that my master timetable looks like this: 10 20 30 50 60 70 What I want back is this: 10 10 20 10 10 What I'm trying to accomplish here is detect that #3 in the ...

Tukey five number summary in Python

I have been unable to find this function in any of the standard packages, so I wrote the one below. Before throwing it toward the Cheeseshop, however, does anyone know of an already published version? Alternatively, please suggest any improvements. Thanks. def fivenum(v): """Returns Tukey's five number summary (minimum, lower-hinge...

Map Reduce count number of documents in each minute MongoDB

I have a MongoDB collection which has a created_at stored in each document. These are stored as a MongoDB date object e.g. { "_id" : "4cacda7eed607e095201df00", "created_at" : "Wed Oct 06 2010 21:22:23 GMT+0100 (BST)", text: "something" } { "_id" : "4cacdf31ed607e0952031b70", "created_at" : "Wed Oct 06 2010 21:23:42 GMT+0100 (BST)",...

Formula/Algorithm for Weighting Game Outcomes

I have an interesting conceptual problem, and I'm wondering if anyone can help me quantify it. Basically, I'm playing a set of games... and for each game I know the probability that I will win, the probability that I will tie, and the probability that I will lose (each game will have different probabilities). At a high level, what I wa...

Predicting Probability of Winning Free-Throw % in Basketball?

My actual problem is a bit more general that this, but here is a specific example. In basketball, you calculate free throw percentage as: Free-Throw Percentage (FT%) = Free-Throws Made (FTM) / Free-Throws Attempted (FTA) I have two teams, and for each team I have the mean and variance of the team's FTM and FTA, so I can model each as ...

Ruby Curve Fitting (logarithmic regression) package

I am looking for a Ruby gem or library that does logarithmic regression (curve fitting to a logarithmic equation). I've tried statsample (http://ruby-statsample.rubyforge.org/), but it doesn't seem to have what I'm looking for. Anybody have any suggestions? ...

Algorithm for generating normally distributed random values in C?

Possible Duplicate: Converting a Uniform Distribution to a Normal Distribution Hello. I'd like to know of any algorithm implemented in C which can take a random value between 0 and 1, the mean and standard deviation and then return a normally distributed result. I have too little brainpower to figure this out for myself righ...

Getting statistics of the activity of my engine

Hello, One of my application is an engine that executes some complex calculations. These calculations may take several hours. I want to know the activity of this engine among time. If you are using Hudson CI server, there is such a feature in Administration > Usages statistics option. Here is an example: In my application, I alread...

where can i find most up to date statistics on programming/scripting languages most used

I just can't find real up to date info on what programming and scripting most used today And in which environment for example : web , desktop , mobile. where can find such info? ...

Online algorithm for calculating absolute deviation

I'm trying to calculate the absolute deviation of a vector online, that is, as each item in the vector is received, without using the entire vector. The absolute deviation is the sum of the absolute difference between each item in a vector and the mean: I know that the variance of a vector can be calculated in such a manner. Va...

Does Statistics::Descriptive percentile method work as documented?

use strict; use warnings; use Statistics::Descriptive; use 5.012; my @data = ( -2, 7, 7, 4, 18, -5 ); my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(@data); say ($stat->percentile(100) // "undef"); # return 18. OK. say ($stat->percentile(0) // "undef"); # return undef instead of "-inf". see doc below Statistics::Desc...