statistics

Does knowledge of statistics make you a better programmer?

Is statistical analysis knowledge required to become a better programmer? How deep do we need it? ...

Declaring a Const Variable in R

I'm working in R, and I'd like to define some variables that I (or one of my collaborators) cannot change. In C++ I'd do this: const std::string path( "/projects/current" ); How do I do this in the R programming language? Edit for clarity: I know that I can define strings like this in R: path = "/projects/current" What I really wa...

How can I work around a round-off error that causes an infinite loop in Perl's Statistics::Descriptive?

I'm using the Statistics::Descriptive library in Perl to calculate frequency distributions and coming up against a floating point rounding error problem. I pass in two values, 0.205 and 0.205, (taken from other numbers and sprintf'd to those) to the stats module and ask it to calculate the frequency distribution but it's getting stuck i...

Adjust algorithm for generating random strength values

A few days ago, you helped me to find out an algorithm for generating random strength values in an online game (thx especially John Rasch). function getRandomStrength($quality) { $rand = mt_rand()/mt_getrandmax(); $value = round(pow(M_E, ($rand - 1.033) / -0.45), 1); return $value; } This function generates values between ...

Efficient way to generate random contingency tables?

What is an efficient way to generate a random contingency table? A contingency table is defined as a rectangular matrix such that the sum of each row is fixed, and the sum of each column is fixed, but the individual elements may be anything as long as the sum of each row and column is correct. Note that it's very easy to generate rando...

Dynamic Programming: Number of ways to get at least N bubble sort swaps?

Let's say I have an array of elements for which a total ordering exists. The bubble sort distance is the number of swaps that it would take to sort the array if I were using a bubble sort. What is an efficient (will probably involve dynamic programming) way to calculate the number of possible permutations of this array that will have a...

Multivariate mapping / regression with objective function

Overview I have a multivariate timeseries of "inputs" of dimension N that I want to map to an output timeseries of dimension M, where M < N. The inputs are bounded in [0,k] and the outputs are in [0,1]. Let's call the input vector for some time slice in the series "I[t]" and the output vector "O[t]". Now if I knew the optimal mapping ...

Java library to do time series analysis

I need to do some analysis of an arbitrary amount of time series in Java. Among others i need to be able to use Linear regression, various smoothing techniques, filtering, etc. I'm not very keen of writing all this from scratch so, do you know of any good Java libraries for these kind of analysis? Edit: R- Seems like an good choice....

where are my visitors going?

What is an easy way to track which external links visitors click on (on my site)? I don't want to route them through my page, e.g., href = $my-url?url=$external-url. Is there an easy JavaScript-solution? ...

How does one track with JS where the visitors are going?

Let me reformulate, as the answer http://stackoverflow.com/questions/951907/where-are-my-visitors-going was absolutely correct, but my question not precise enough ;) How does one track with Java Script where the visitors are going? (From a technical standpoint.) Is the idea to execute a code every time a link is pressed? If yes, does t...

Wordpress style stats for regular pages

Hi, I am wondering if there is something similar to Wordpress stats or if I can use the Wordpress stats engine on a regular site (which does not use Wordpress). I really like the interface of the stats and although I have Google Analytics installed, I see myself more comfortable with the WordPress stats engine. Can you suggest me a goo...

Need good way to choose and adjust a "learning rate"

In the picture below you can see a learning algorithm trying to learn to produce a desired output (the red line). The learning algorithm is similar to a backward error propagation neural network. The "learning rate" is a value that controls the size of the adjustments made during the training process. If the learning rate is too high,...

Choose random array element satisfying certain property

Suppose I have a list, called elements, each of which does or does not satisfy some boolean property p. I want to choose one of the elements that satisfies p by random with uniform distribution. I do not know ahead of time how many items satisfy this property p. Will the following code do this?: pickRandElement(elements, p) rand...

Estimating a probability given other probabilities from a prior

I have a bunch of data coming in (calls to an automated callcenter) about whether or not a person buys a particular product, 1 for buy, 0 for not buy. I want to use this data to create an estimated probability that a person will buy a particular product, but the problem is that I may need to do it with relatively little historical data ...

Statistics and matrix algebra in Ruby

I need to inverse a variance-covariance matrix in Ruby and vector by matrix multiplication. Which numerical Ruby library/Gem should I use? ...

Does any know where to find a breakdown of iPhone users by phone generation

Now that the 3.0 OS & 3Gs are coming out, there will be a wider range in hardware and also in functional limitations (No P2P on first gen iDevices, No compass on anything but iPhone 3Gs, etc) of users that will be (ideally) buying our apps. In the same way that W3schools has it's browser stats page (http://www.w3schools.com/browsers/bro...

Taking an average in SQL after throwing away outliers

I have a generic log table which I can attach to processes and their results. I get the average time using a process performance view: WITH Events AS ( SELECT PR.DATA_DT_ID ,P.ProcessID ,P.ProcessName ,PL.GUID ,PL.E...

Determining the best initial buffer size for decompressing streamed compressed data

I am trying to calculate an initial buffer size to use when decompressing data of an unknown size. I have a bunch of data points from existing compression streams but don't know the best way to analyze them. Data points are the compressed size and the ratio to uncompressed size. For example: 100425 (compressed size) x 1.3413 (compressi...

Finding PI digits using Monte Carlo

I have tried many algorithms for finding π using Monte Carlo. One of the solutions (in Python) is this: def calc_PI(): n_points = 1000000 hits = 0 for i in range(1, n_points): x, y = uniform(0.0, 1.0), uniform(0.0, 1.0) if (x**2 + y**2) <= 1.0: hits += 1 print "Calc2: PI result", 4.0 * floa...

Efficient week statistics for a QuerySet

I am working on an open source Django time tracking app, Djime, and I'm trying to come up with a more efficient way to produce statistics. So far, we've had some rather long-winded procedural code that would get all TimeSlices for a period, and collate them together in a huge nested list/dictionary mess. What I'd like to do is to set up...