statistics

Is there a Perl statistics package that doesn't make me load the entire dataset at once?

I'm looking for a statistics package for Perl (CPAN is fine) that allows me to add data incrementally instead of having to pass in an entire array of data. Just the mean, median, stddev, max, and min is necessary, nothing too complicated. The reason for this is because my dataset is entirely too large to fit into memory. The data sourc...

Where can i find the: 1) cache hits and 2) cache lookup stats in SQL Server

First of, the question is not: how to see SQL Server cache hit rate! For that one i already know of a view that contains that precises statistic. My question is in fact: where are the raw statistics from where the hit ratio is calculated? Sql server, in msdn pages, states that the cache hit ratio is the total cache hits divided by the ...

Any alternatives to Google Trends?

I'm writing a small helper utility for obscure software that is used at a local shop. Basically, I would like to know if anyone searches for anything associated with that software and if publishing my work on the Internet would make any sense. I entered the name of the software into Google Trends, but my terms "do not have enough search ...

How can I get the average and standard deviations grouped by key?

I've need to find the average and standard deviation of a large amount of data in this format. I tried using Excel but there doesn't appear to be an easy way to transpose the columns. What am I missing in Excel or should I just use Perl? Input file format is: 0 123 0 234 0 456 1 657 1 234 1 543 Want result to group the...

Good introductory statistics book?

Hello, what is a good introductory statistics book you can recommend? if there is a whole sequence of books that should be read, please do not hesitate to mention it. Books with applications are also welcome. I am aware that a single search on Amazon (or any other book seller) will provide me tons of titles, but some of them are avoidab...

Difference of binomial parameters in R

I have two random variables: X ~ binom(n, p1) Y ~ binom(n, p2) n is a known parameter (the total number of trials), while p1 and p2 are unknown. I have one sample from each distribution (x from X, and y from Y). To give some context, x and y are numbers of true positives from two different classifiers, at a fixed selectivity. I woul...

Open source statistics library for C# for generating random number of various distributions?

This is for simulation. In particular, I'm trying to generate natural sounding words and names, and the uniform distribution in the Random class provides doesn't cut it. This isn't a dupe question because the similar questions weren't look for C# random number generators. ...

Best way to extract Mean Square Values from aov object in r

I'm trying to write a function to automate doing a variance analysis, part of which involves doing some further calculations. The method I've been using isn't very robust, if variable names change then it stops working. For this dummy data > dput(assayvar,"") structure(list(Run = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L...

How can I use foreach with snow, to allow multicore on windows XP (in R).

could you please give an example on how that might be done using the doSNOW ? (I asked the same question here: http://blog.revolution-computing.com/2009/08/parallel-programming-with-foreach-and-snow.html And got only a partial reply) Tal ...

J programming Language vs R Programming Language vs Incanter

Has anyone tried both J programming language form jsoftware and R language. After some search I faced incanter which is based on clojure. I want to learn a statistical language for data analysis. Which one do you prefer? Why? Please consider conditions below, thanks. productivity performance community library syntax ...

Workflow for statistical analysis and report writing

Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this: Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district. The analyst downloads some data, munges the data and saves the result (e.g. adding a column ...

Probabilities & Statistics, learning material

I'm not sure how to post this. So I'm setting the question as CW off the bat. Hope you don't mind. It relates to programming in the sense this is where I will be applying it. But it is not a programming question. I'm in need of learning material (in the shape of books preferably) that can teach me Probability and Statistics from the gro...

How does RetailMeNot calculate its success rate trends?

I am developing a rails application where I need a "success rate" system similar to RetailMeNot. I noticed that they use jQuery Sparkline library (http://omnipotent.net/jquery.sparkline/) to generate a success rate trend for each coupon. For example, in their source code: <em>84%</em> Success<br/><span class="trend">14,18,18,22,19,16,1...

How to use Outlier Tests in R Code

As part of my data analysis workflow, I want to test for outliers, and then do my furthur calculation with and without those outliers. I've found the outlier package, which has various tests, but I'm not sure how best to use them for my workflow. ...

PHP + MySQL Web Stats

I wondering what ideas you guys had on the best method of doing some web counters backend. I will be tracking downloads via PHP, I'm looking at around 1.5 million "downloads" per day and all I will be storing would be "userid" and "downloadid". Possibly time too? What would the best way be? At the end of every day should I compile all th...

Understanding T-SQL stdev, stdevp, var, and varp

I'm having a difficult time understand what these statistics functions do and how they work. I'm having an even more difficult time understanding how stdev works vs stdevp and the var equivelant. Can someone please break these down into dumb for me? ...

rescaling ranges

hi, for example i have two ranges (1) 0 to 3 (2) 10 to 15 in range (1) i have numbers between 0 and 3, where 0 is minimum and 3 is maximum value...(it has also values 1 and 2)... now i wanted to rescale both ranges (1) and (2) to range 0 to 1. Can you show me how to do it or at least poi...

POS software/hardware general questions...

I'm looking for answers to a few general questions as to how point of sale (POS) software and hardware generally works in brick-and-mortar stores. I realize there will be many edge cases given the sheer number of solutions out there, but I'm looking for answers on the most common setups... So, here it goes: I realize that there are sev...

Are these memcached stats normal?

Are these stats normal? I have problems with my PHP products, so I want to know if these data are healthy stats STAT pid 2312 STAT uptime 5292037 STAT time 1253692925 STAT version 1.2.8 STAT pointer_size 64 STAT rusage_user 2600.605647 STAT rusage_system 9533.168738 STAT curr_items 1153303 STAT total_items 139795434 STAT bytes 435570863...

Good ways to code complex tabulations in R?

Hi R-ers, Does anyone have any good thoughts on how to code complex tabulations in R? I am afraid I might be a little vague on this, but I want to set up a script to create a bunch of tables of a complexity analogous to the stat abstract of the united states, (e.g.: http://www.census.gov/compendia/statab/tables/09s0015.pdf). And I wou...