statistics

A/B Test statistics

I am trying to do some statistical analysis of different A/B tests to see which alternative is better and have found conflicting information about this. First, I am interested in a couple different things: Tests that measure success by counting events, such as conversions or emails sent Tests that measure success by counting revenue T...

What is a good solution for calculating an average where the sum of all values exceeds a double's limits?

I have a requirement to calculate the average of a very large set of doubles (10^9 values). The sum of the values exceeds the upper bound of a double, so does anyone know any neat little tricks for calculating an average that doesn't require also calculating the sum? I am using Java 1.5. ...

Are statistics automatically updated when a new index is created?

Is there any benefit to running an UPDATE Statistics after you create an index or is it done automatically for you? ...

When to choose R vs. SciPy?

What are some advantages and disadvantages to doing statistical analyses in SciPy vs. R? They seem to have been designed with opposite philosophies (plain old library vs. DSL). What are some rules of thumb about which is the right tool for the job? ...

SQL Server 2008: recreating the same indexes and statistics across different databases

I've recently been assigned the task of updating a "legacy" database (previously managed by an ex-coworker who left the company a while ago). The new database has the exact same structure as the previous one; the only difference is in the content itself, as this database now has more recent data. The problem is that the old database ha...

How to measure lock contention?

I'm reading http://lse.sourceforge.net/locking/dcache/dcache%5Flock.html, in which spinlock time for each functions is measured: SPINLOCKS HOLD WAIT UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME 5.3% 16.5% 0.6us(2787us) 5.0us(3094us)(0.89%) 15069563 83.5% 16.5% 0% dcache_...

Page views in Rails

Hi Happy Holidays. I am not sure what is the best way to deal with this. I want to display page views and user views (How many unique users viewed a page). Is there a plugin for this? login_count is of course easy to check though. Just not sure about views. Google Analytics does the job well but I don't know whether it's to good to go...

mca or various ca (multivariate analysis)

Hello. I will make a analysis about some information of my company. I thought making a ca to representate the association between two variables. I have 3 variables: Category, Tag, Valoration. My idea is to make 2 analysis, one to view the association between Category - Valorarion and a second analysis between Tag - Valoration. But I th...

Determining the number of possible combinations

I'm trying to figure out how many possible ways there are to combine various elements form this string. "{Hello|Hi|Hey} {world|earth}{!|.|?}" Where one item (separated by a pipe/|) is selected at random from each group ({}) and combined into a single string. So the above "template" could produce: Hello world. Hi earth? Hey world. Hi...

iPhone development - How to get some device info

So, I know that is possible to retrieve some info, like Device name, unique ID, etc. all of them provided by UIDevice Class. I would like to know if there is a way to get information related with wireless usage (Download and upload), amount of SMS sent, minutes of talking, and any other statistics. Does anyone has any idea on how to ge...

R Language: Import multiline SQL query to single string

In R, how can I import the contents of a multiline text file (containing SQL) to a single string? The sql.txt file looks like this: SELECT TOP 100 setpoint, tph FROM rates I need to import that text file into an R string such that it looks like this: > sqlString [1] "SELECT TOP 100 setpoint, tph FROM rates" That's so that I ...

Subversion statistics list

Hi all. I'm going to write a little library and after the UI for agregating and visualizing statistics from the specified subversion repository. My question is: What do you need like a developers/leads/managers to see on the statistics ? Here I put down some initial ideas : 1. Commits by author(s) 2. Files that were changed by the...

Ehcache Statistics by key

Hello all! I am interested in getting statistics on the Ehcache I have running. I would like to see the number of hits/misses for a given key over a period of time. Perhaps in the form of a map. For example. For the passed hour (or however long it has been running) Key A had 30 hits and 2 misses Key B had 400 hits and 100 misses...

How to obtain server side stats using VSTS2008-Test edition?

How to obtain server side stats using VSTS2008-Test edition? ...

Error for Minitab program (statistics)

I have a problem with Minitab (a statistics program). When I enter my general model and want to see the results, the program warns me: too few arguments or space missing. Everything seems true, and I can't determine where the problem is. Could anyone help me on this subject? ...

Too many data points in set. Looking for ways to prune.

I am gathering data from a website. I estimate to get 10.000 datapoints (time - value) multipled by seven - over time. That is way to much. Both for storing and plotting it in a real time alike graph (through jQuery flot). I'm looking for a text dealing with this sort of problems. To be more precise: algorithms, statistical math for find...

How to find center of clusters of numbers? statistics problem?

Hi, I have a problem where I have a set of numbers eg. 5, 7, 7, 8, 8, 8, 7, 20, 23, 23, 24, 24, 24, 25 In the above set, there is two "clusters" of numbers, I want to write a program to find the centers of these clusters. Could you call them attractors as in Fractal theory? So the program would, I guess, find that the set can be divid...

What statistics should a programmer (or computer scientist) know?

I'm a programmer with a decent background in math and computer science. I've studied computability, graph theory, linear algebra, abstract algebra, algorithms, and a little probability and statistics (through a few CS classes) at an undergraduate level. I feel, however, that I don't know enough about statistics. Statistics are increasin...

Pseudocode implementation of Excel's TDIST function

I've been doing some research on statistical significance, and I've learned a lot but seem to have hit a wall when it comes to calculating P values. I feel like I'm about 95% of the way there; it's just that everything I read on calculating P values references a table rather than offering a programmatic solution. It seems that Excel's ...

Group detection in data sets

Assume a group of data points, such as one plotted here (this graph isn't specific to my problem, but just used as a suitable example): Inspecting the scatter graph visually, it's fairly obvious the data points form two 'groups', with some random points that do not obviously belong to either. I'm looking for an algorithm, that would ...