statistics

Generation of uniformly distributed random noise

I've been working on generating Perlin noise for a map generator of mine. The problem I've run into is that the random noise is not distributed normally, and is more likely a normal distribution of kinds. Given two integers X and Y, and a seed value, I do the following: Use MurmurHash2 to generate a random number (-1,1). This is unifo...

Stepwise Regression using P-Value

Hello Everybody ! I want to use R to perform a stepwise linear Regression using p-values as a selection criterion e.g. at each step dropping variables that have the highest i.e. the most insignificant p-values, stopping when all values are significant defined by some treshold alpha. I am totally aware that I should use the AIC (e.g. co...

PHP-driven exhaustive stats - server-side text files or MySQL tables?

I've got a gaming-oriented website with 200+ users. The site has a large database tracking user plays, and one of the motivations for continued participation is the extensive statistics and rankings (S&R) with which the site provides the user. As the list of S&Rs tracked has grown, some of the more intricate calculations have been moved...

How would YOU compute IMDB movie rating?

I'm doing this only for learning purposes. I've no intentions of reversing the methods of IMDB. I asked myself I owned IMDB or similar website. How would I compute the movie rating? All I can think of is Weighted Average(which is nothing but Arithmetic Mean) For a movie data provided below computation would be (38591*10 + 27994*9...

How good is my error in data mining

Hi, I'm trying to calculate how good are my measurements in machine learning! Let's say that I have five choices, and that error is 4,2, 0.002, 3, 6. Naturally, I will pick third one for the hit, but I would like to say following: I'm X% certain that hit is third pick I'm Y% certain that hit is first (last) pick Of course, X>>Y but I ...

R: Entering variables into regression function

I have this feature_list that contains several possible values, say "A", "B", "C" etc. And there is time in time_list. So I will have a loop where I will want to go through each of these different values and put it in a formula. something like for(i in ...) and then my_feature <- feature_list[i] and my_time <- time_list[i] then i put ...

java statistics collection for performance evaluation

What is the most efficient way to collect and report performance statistic analysis from an application? If I have an application that uses a series of network apis, and I want to report statistics at runtime, e.g. Method doA() was called 3 times and consumed on avg 500ms Method doB() was called 5 times and consumed on avg 1200ms et...

Are there any useful libraries/service for creating custom web application-level stats?

I run a website for music students that allows them to stream a variety of content from a number of sources. Our primary 'customers' are really the librarians at the institutions who subscribe to our service, so it's important to them to see the actual usage of the service, but they want more information than simple web analytics are ab...

What kind of service that provides hosting + statistic for app distribution ?

Hi, i recently finished my Mac OSX Application, and struggled to monitoring the statistic of my app. I wondering if there's kind of service such as Google Analytics for application distribution ? it would be great if they provide hosting too.. thanks ...

R (statistical) scoping error using transformBy(), part of the doBy package.

I think I'm getting a scoping error when using transformBy(), part of the doBy package for R. Here is a simple example of the problem: > library(doBy) > > test.data = data.frame( + herp = c(1,2,3,4,5), + derp = c(2,3,1,3,5) + ) > > transformData = function(data){ + + five = 5 + + transformBy( + ~ herp, + data=data, + sum=he...

Does google analytics slow down my website?

I am at the final stages of my website, and currently I need to find a suitable statistics application/tool. I have looked into webalizer, but it seems outdated. Also, I have looked into Google analytics, but I am afraid that if I implement it, my website will go slow. It is already pretty heavy with database material being displayed w...

Mathematical library to compare simularities in graphs of data for a high level language (eg. Javascript)?

I'm looking for something that I guess is rather sophisticated and might not exist publicly, but hopefully it does. I basically have a database with lots of items which all have values (y) that correspond to other values (x). Eg. one of these items might look like: x | 1 | 2 | 3 | 4 | 5 y | 12 | 14 | 16 | 8 | 6 This is just a a rando...

How should I order these "helpful" scores?

Under the user generated posts on my site, I have an Amazon-like rating system: Was this review helpful to you: Yes | No If there are votes, I display the results above that line like so: 5 of 8 people found this reply helpful. I would like to sort the posts based upon these rankings. If you were ranking from most helpful to ...

Is there any current review of statistical modules for Perl?

Hello, I would like to know which is the current status of the statistical modules in CPAN, does any one know any recent review or could comment about its likes/dislikes with those modules? I have used the clasical: Statistics::Descriptive, Statistics::Distributions, and some others contained in Bundle::Math::Statistics Some of the ...

Statistical accumulator in Python

An statistical accumulator allows one to perform incremental calculations. For instance, for computing the arithmetic mean of a stream of numbers given at arbitrary times one could make an object which keeps track of the current number of items given, n and their sum, sum. When one requests the mean, the object simply returns sum/n. An ...

Deciding weights for the parameters (similar to Google pagerank)

Hi, I crawled some blogs for my project and extracted a few features, like length of the document, in links, out links. Each of these blogs talks about some specific subject and there can be numerous articles on each subject, and I need to decide at most one or two important blogs for each subject. How can I assign weights to these feat...

Statistics based on MySQL and PHP

Hi I'm about to generate some statistics based on the values of a MySQL table. I would like to generate some numbers foreach month of the year and foreach day of the month. I could of course do all this manually but that doesn't seem like a good approach :) So anybody who has some ideas on how i generate these statistics. OBS. I would...

Deciding test statistic and distribution for a random number test

Hi, How do u decide on a test statistic while developing a test for random number testing and its likely distribution. How do u calculate and decide on the formula for calculating a p value for the test statistics distribution. TIA ...

How can I find low regions in a graph using Perl/R?

I'm examining some biological data which is basically a long list (a few million values) of integers, each saying how well this position in the genome is covered. Here is a graphical example for a data set: I would like to look for "valleys" in this data, that is, regions which are significantly lower than their surrounding environmen...

Among MATLAB and Python, which one is good for statistical analysis?

I have been using MATLAB for my work, but I have started learning Python lately. I employ statistical analysis, more precisely geostatistics, in my work. I was wanting to ask, from your perspectives, which one among the two languages is good for statistical analysis? What are the pros and cons, other than accessibility, for each? ...