statistics

Generating a gaussian distribution with only positive numbers

Is there any way to randomly generate a set of positive numbers such that they have a desired mean and standard deviation? I have an algorithm to generate numbers with a gaussian distribution, but I don't know how to deal with negative numbers in a way the preserves the mean and standard deviation. It looks like a poisson distribution...

Objective-C implementation of the Wilson Score Interval

I'm looking for an objective-c library or just the functions that can handle calculating the Wilson Score Interval explained here: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html For reference, here's a Ruby implementation from the same source: require 'statistics2' def ci_lower_bound(pos, n, power) if n == 0 ...

Tool for program statistics

Is there a tool which is able to parse my source code (fortran, C or C++) and return statistics such as the number of loops, the average loop size, the number of functions, the number of function calls, the number, size and type of arrays, variables, etc ? Something similar to this which does not run easily on my architecture ...

How do I efficiently estimate a probability based on a small amount of evidence?

I've been trying to find an answer to this for months (to be used in a machine learning application), it doesn't seem like it should be a terribly hard problem, but I'm a software engineer, and math was never one of my strengths. Here is the scenario: I have a (possibly) unevenly weighted coin and I want to figure out the probability o...

How to display top 10 contributors in Confluence

I would like to display top 10 contributors to a space in Confluence in the last year that have created, updated or removed pages. There is some statistics on Browse Space > Activity, but only monthly, not yearly. ...

Weighted average in T-SQL (like Excel's SUMPRODUCT)

I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation): (A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An) The first part reflects the same functionality as Excel's SUMPRODUCT() function. My catch is that I need to dynamically specify...

Is there a free Statistics Package for Delphi?

Is there an open source and/or free statistics package or library for Delphi? I'm looking for something that can compile directly into the executable, so no DLL's. It needs to be compatible with Delphi 2009 and later (the Unicode versions). Hopefully there is something comprehensive available out there. By comparison, I am used to the a...

Multivariate time series modelling in R

I want do fit some sort of multi-variate time series model using R. Here is a sample of my data: u cci bci cpi gdp dum1 dum2 dum3 dx 16.50 14.00 53.00 45.70 80.63 0 0 1 6.39 17.45 16.00 64.00 46.30 80.90 0 0 0 6.00 18.40 12.00 51.00 47.30 82.40 1 0 0 6.57 19.35 7.00 42.00...

Is it possible to do an algebraic curve fit with just a single pass of the sample data?

I would like to do an algebraic curve fit of 2D data points, but for various reasons - it isn't really possible to have much of the sample data in memory at once, and iterating through all of it is an expensive process. (The reason for this is that actually I need to fit thousands of curves simultaneously based on gigabytes of data whic...

Generating correlated numbers

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array X and array Y, where the values of array X and array Y must be re-generated, re-ordered or transformed until they are correlated with each...

Howto bin series of float values into histogram in Python?

I have set of value in float (always less than 0). Which I want to bin into histogram, i,e. each bar in histogram contain range of value [0,0.150) The data I have looks like this: 0.000 0.005 0.124 0.000 0.004 0.000 0.111 0.112 Whith my code below I expect to get result that looks like [0, 0.005) 5 [0.005, 0.011) 0 ...etc.. I trie...

R statistical package: wrapping GOFrame objects

Dear all, I'm trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/GOstatsForUnsupportedOrganisms.pdf). However, following the instructions literally doesn't help me. Here's the code I execute (R 2.9.2 on u...

Calculating variance with large numbers

I haven't really used variance calculation that much, and I don't know quite what to expect. Actually I'm not too good with math at all. I have a an array of 1000000 random numeric values in the range 0-10000. The array could grow even larger, so I use 64 bit int for sum. I have tried to find code on how to calc variance, but I don't ...

Statistics Question

Suppose I conduct a survey of 10 people asking whether to rank a movie as 0 to 4 stars. Allowable answers are 0, 1, 2, 3, and 4. The mean is 2.0 stars. How do I calculate the certainty (or uncertainty) about this 2.0 star rating? Ideally, I would like a number between 0 and 1, where 0 represents complete uncertainty and 1 represents ...

How to efficiently find correlation and discard points outside 3-sigma range in MATLAB?

I have a data file m.txt that looks something like this (with a lot more points): 286.842995 3.444398 3.707202 338.227797 3.597597 283.740414 3.514729 3.512116 3.744235 3.365461 3.384880 Some of the values (like 338.227797) are very different from the values I generally expect (smaller numbers). So, I am thinking that I will remove...

How much code can a programmer be intimately familar with?

Are there any statistics for this? I realize it must vary from person to person, but it seems like there should be a general average. The reason I ask is that the company I contract for has multiple software products, totaling ~75,000 lines of code - and they seemed disappointed and shocked when they ask me a question about a specific p...

Simple statistics - Java packages for calculating mean, standard deviation, etc...

Could you please suggest any simple Java statistics packages? I don't necessarily need any of the advanced stuff. I was quite surprised that there does not appear to be a function to calculate the Mean in the java.lang.Math package... What are you guys using for this? EDIT Regarding: How hard is it to write a simple class tha...

How to seed RRDtool from file with timestamps?

I have a file with timestamps for hits on a system. How can I feed this into the RRDtool database (or other similar solution), so that I can plot a time graph? ...

Ruby Percentile calculations to match Excel formulas (need refactor)

I've written two simple calculations with Ruby which match the way that Microsoft Excel calculates the upper and lower quartiles for a given set of data - which is not the same as the generally accepted method (surprise). My question is - how much and how best can these methods be refactored for maximum DRYness? # Return an upper quar...

R: Use VAR model to predict response to change in values of certain variables

Hi I've fitted a VECM model in R, and converted in to a VAR representation. I would like to use this model to predict the future value of a response variable based on different scenarios for the explanatory variables. Here is the code for the model: library(urca) library(vars) input <-read.csv("data.csv") ts <- ts(input[16:52,],c(200...