Hi everyone,
I have written a Perl script that performs many one-sample t-tests. I get thousands of t-statistics with their degrees of freedom (df). I need to upgrade the script to also return their p-values (there are too many to look them up manually in a table). Is there some kind of formula I can use for this with the t-statistic an...
I want to make voronoi treemaps for statistics data, like
newsgraphy
Do you know how I can do that in Perl, PHP, Ruby, or Python?
...
If I use relative frequency to estimate the probability of an event, how good is my estimate based on the number of experiments? Is standard deviation a good measure? A paper/link/online book would be perfect.
http://en.wikipedia.org/wiki/Frequentist
...
A single user can show up as multiple unique users over a period of time when he vists a site. Internally, the user's IP address is static, but on the net the user is represented by the ISP router's IP address, isn't it?
...
I have keys that can vary in length between 1 and 256 characters*; how can I calculate the probability that any two keys will collide when using md5 (baring a brute force solution of trying each key)?
* the character set is restricted to [a-z.-]
...
In R (or S-PLUS), what is a good way to aggregate String data in a data frame?
Consider the following:
myList <- as.data.frame(c("Bob", "Mary", "Bob", "Bob", "Joe"))
I would like the output to be:
[Bob, 3
Mary, 1
Joe, 1]
Currently, the only way I know how to do this is with the summary function.
> summary(as.data.frame(myL...
This can be broken down into a simple trio of equations:
a + b = 3
b + c = 5
a + c = 4
How can I best approximate the values? Note, I'll have many more such totals and variables in real applications. Particularly, I want to find if its possible to usefully approximate the cost of food by item lists and totals from grocery receipts. I ...
I need to do something like:
SELECT value_column1
FROM table1
WHERE datetime_column1 >= '2009-01-01 00:00:00'
ORDER BY datetime_column1;
Except in addition to value_column1, I also need to retrieve a moving average of the previous 20 values of value_column1.
Standard SQL is preferred, but I will use MySQL extensions if necessary.
...
I was thinking about different scalability features, and suddenly understand that I don't really know how much can handle one server (VPS). The question for them who have loaded projects.
Imagine server with:
1 Gb Ram
1 Xeon CPU
CentOS
LAMP with FastCGI
PostgreSQL on the same machine
And we need to calculate count of request, so I d...
Hello,
I need to code a solution for a certain requirement, and I wanted to know if anyone is either familiar with an off-the-shelf library that can achieve it, or can direct me at the best practice. Description:
The user inputs a word that is supposed to be one of several fixed options (I hold the options in a list). I know the input ...
I have a table like this that stores messages coming through a system:
Message
-------
ID (bigint)
CreateDate (datetime)
Data (varchar(255))
I've been asked to calculate the messages saved per second at peak load. The only data I really have to work with is the CreateDate. The load on the system is not constant, there are times when...
I want to analyze answers to a web survey (Git User's Survey 2008 if one is interested). Some of the questions were free-form questions, like "How did you hear about Git?". With more than 3,000 replies analyzing those replies entirely by hand is out of the question (especially that there is quite a bit of free-form questions in this surv...
Anyone know of a script to get a listing that will be able to tell me the most frequently called functions for a C project?
method1 391
method2 23
method3 12
Even better have it be customizable to require a keyword in the method name "get".
I'm trying not to reinvent the wheel and write a simple script to do it myself. Probably an ea...
I'm using Python and Numpy to calculate a best fit polynomial of arbitrary degree. I pass a list of x values, y values, and the degree of the polynomial I want to fit (linear, quadratic, etc.).
This much works, but I also want to calculate r (coefficient of correlation) and r-squared(coefficient of determination). I am comparing my re...
I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..
...
Hi there,
I am thinking of creating an open source application that given a list of high speed download sites/files it can generate a list of calls to these sites in order to form the following statistics.
Ping Times
Download Rate
Packet Loss
I was thinking whether or not there are builtin or external libraries in .NET to do that ki...
I've been searching for an API that provides hourly/daily updates for golf statistics such as leaderboards and player rankings (free or paid) without much luck. The golf channel has a similar system but it's not exposed in anyway. Any ideas on a good place to look?
...
I need this to be used as a delimiter,
has anyone known about this statistics?
...
Hi everyone,
I'm trying to calculate correlations in Perl. I found out how to calculate correlations between arrays in CPAN, but I can't seem to find out how to get the t-statistics and p-values of those correlations (R gives these automatically). Is that even possible in Perl? I hope someone can help because I need to determine the sig...
I'm firing off tasks using an ExecutorService, dispatching tasks that need to be grouped by task-specific criteria:
Task[type=a]
Task[type=b]
Task[type=a]
...
Periodically I want to output the average length of time that each task took (grouped by type) along with statistical information such as mean/median and standard deviation.
Th...