statistics

How do I calculate a p-value if I have the t-statistic and d.f. (in Perl)?

Hi everyone, I have written a Perl script that performs many one-sample t-tests. I get thousands of t-statistics with their degrees of freedom (df). I need to upgrade the script to also return their p-values (there are too many to look them up manually in a table). Is there some kind of formula I can use for this with the t-statistic an...

How can I make voronoi treemaps?

I want to make voronoi treemaps for statistics data, like newsgraphy Do you know how I can do that in Perl, PHP, Ruby, or Python? ...

probability and relative frequency

If I use relative frequency to estimate the probability of an event, how good is my estimate based on the number of experiments? Is standard deviation a good measure? A paper/link/online book would be perfect. http://en.wikipedia.org/wiki/Frequentist ...

How to find out if the if a site user is unique or a returning user in ASP.NET?

A single user can show up as multiple unique users over a period of time when he vists a site. Internally, the user's IP address is static, but on the net the user is represented by the ISP router's IP address, isn't it? ...

How do I calculate the likelyhood of a collision using md5?

I have keys that can vary in length between 1 and 256 characters*; how can I calculate the probability that any two keys will collide when using md5 (baring a brute force solution of trying each key)? * the character set is restricted to [a-z.-] ...

In R, what is a good way to aggregate String data

In R (or S-PLUS), what is a good way to aggregate String data in a data frame? Consider the following: myList <- as.data.frame(c("Bob", "Mary", "Bob", "Bob", "Joe")) I would like the output to be: [Bob, 3 Mary, 1 Joe, 1] Currently, the only way I know how to do this is with the summary function. > summary(as.data.frame(myL...

Deducing item cost from totals?

This can be broken down into a simple trio of equations: a + b = 3 b + c = 5 a + c = 4 How can I best approximate the values? Note, I'll have many more such totals and variables in real applications. Particularly, I want to find if its possible to usefully approximate the cost of food by item lists and totals from grocery receipts. I ...

How do I calculate a moving average using MySQL?

I need to do something like: SELECT value_column1 FROM table1 WHERE datetime_column1 >= '2009-01-01 00:00:00' ORDER BY datetime_column1; Except in addition to value_column1, I also need to retrieve a moving average of the previous 20 values of value_column1. Standard SQL is preferred, but I will use MySQL extensions if necessary. ...

Statistic for requests in deployed VPS servers

I was thinking about different scalability features, and suddenly understand that I don't really know how much can handle one server (VPS). The question for them who have loaded projects. Imagine server with: 1 Gb Ram 1 Xeon CPU CentOS LAMP with FastCGI PostgreSQL on the same machine And we need to calculate count of request, so I d...

Algorithm for Comparing Words (Not Alphabetically)

Hello, I need to code a solution for a certain requirement, and I wanted to know if anyone is either familiar with an off-the-shelf library that can achieve it, or can direct me at the best practice. Description: The user inputs a word that is supposed to be one of several fixed options (I hold the options in a list). I know the input ...

SQL: Calculating system load statistics

I have a table like this that stores messages coming through a system: Message ------- ID (bigint) CreateDate (datetime) Data (varchar(255)) I've been asked to calculate the messages saved per second at peak load. The only data I really have to work with is the CreateDate. The load on the system is not constant, there are times when...

How to categorize and tabularize free-form answers to a question in a survey?

I want to analyze answers to a web survey (Git User's Survey 2008 if one is interested). Some of the questions were free-form questions, like "How did you hear about Git?". With more than 3,000 replies analyzing those replies entirely by hand is out of the question (especially that there is quite a bit of free-form questions in this surv...

Script to create a listing of: <Method Name> <Num of times called> for a particular project directory.

Anyone know of a script to get a listing that will be able to tell me the most frequently called functions for a C project? method1 391 method2 23 method3 12 Even better have it be customizable to require a keyword in the method name "get". I'm trying not to reinvent the wheel and write a simple script to do it myself. Probably an ea...

How do I calculate r-squared using Python and Numpy?

I'm using Python and Numpy to calculate a best fit polynomial of arbitrary degree. I pass a list of x values, y values, and the degree of the polynomial I want to fit (linear, quadratic, etc.). This much works, but I also want to calculate r (coefficient of correlation) and r-squared(coefficient of determination). I am comparing my re...

How do I determine the standard deviation (stddev) of a set of values?

I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc.. ...

How to collect network statistics using c# such as Ping ms, Download rate, Packet loss

Hi there, I am thinking of creating an open source application that given a list of high speed download sites/files it can generate a list of calls to these sites in order to form the following statistics. Ping Times Download Rate Packet Loss I was thinking whether or not there are builtin or external libraries in .NET to do that ki...

Golf Stats API providing to the minute updates?

I've been searching for an API that provides hourly/daily updates for golf statistics such as leaderboards and player rankings (free or paid) without much luck. The golf channel has a similar system but it's not exposed in anyway. Any ideas on a good place to look? ...

what's the least frequently used characters for web users?

I need this to be used as a delimiter, has anyone known about this statistics? ...

How do I get t-statistics and p-values of correlations in Perl?

Hi everyone, I'm trying to calculate correlations in Perl. I found out how to calculate correlations between arrays in CPAN, but I can't seem to find out how to get the t-statistics and p-values of those correlations (R gives these automatically). Is that even possible in Perl? I hope someone can help because I need to determine the sig...

How to track task execution statistics using an ExecutorService?

I'm firing off tasks using an ExecutorService, dispatching tasks that need to be grouped by task-specific criteria: Task[type=a] Task[type=b] Task[type=a] ... Periodically I want to output the average length of time that each task took (grouped by type) along with statistical information such as mean/median and standard deviation. Th...