questions about statistics | ansaurus

statistics

Rolling median algorithm in C

I am currently working on an algorithm to implement a rolling median filter (analogous to a rolling mean filter) in C. From my search of the literature, there appear to be two reasonably efficient ways to do it. The first is to sort the initial window of values, then perform a binary search to insert the new value and remove the exiting ...

Use of Different .Net Languages?

Is there a breakdown of the popularity of the different .Net languages available? Does anyone know of any surveys that give this information, or even if it is possible to determine this? Update The answer is not a list of the different .Net languages. I would like to see statistics showing the relative usage/popularity of each .Net...

programming-languages

How do you measure the popularity of a programming language?

Hi, Following on from this question, I am interested in finding out how you could measure the popularity of any and all programming languages. As professional developers, we need to be aware of the trends in the software industry - what languages will employers be looking for in the coming few years, and we should be proficient in. Al...

programming-languages

Is there an easy way to calculate quantiles with bash?

Lets say I have a log file from a web server with response times per request: _1st_request 1334 _2nd_request 345 _3rd_request 244 _4th_request 648 ......... etc Is there an easy way with bash scripting to find the top decile (10-quantile)? In other words to answer the question: How slow was the slowest request if I exclude the slowest...

shell-scripting

What is the best open source solution for storing time series data?

I am interested in monitoring some objects. I expect to get about 10000 data points every 15 minutes. (Maybe not at first, but this is the 'general ballpark'). I would also like to be able to get daily, weekly, monthly and yearly statistics. It is not critical to keep the data in the highest resolution (15 minutes) for more than two mont...

How can I measure volatility?

I am trying to determine the volatility of a rank. More specifically, the rank can be from 1 to 16 over X data points (the number of data points varies with a maximum of 30). I'd like to be able to measure this volatility and then map it to a percentage somehow. I'm not a math geek so please don't spit out complex formulas at me :) I...

Message logging, find out where the messages come from?

At this point, we have three websites, an open api, some ten services, and numerous other parts of our infrastructure; and they all can send statistic messages into the queue. But, there is a problem, we would really like to know where the messages come from, as we had some issues in the past, where a statistic was logged when that shou...

statistics service recomendation

does anyone know of a good statistics service for a widget I'm developing? my requirements are 1.have the ability to get hundreds of thousands of events per day. 2.API to get results and registering events. 3.near real time results. thnx michael ...

Source code statistics

Is there some free tool (preferably command line based) that you can give your root source directory and it will inspect all files and sub-folders and generate a set of nice "statistics"? Like... lines of code, number of classes, etc? I just thought it would be quite a nice and interesting way for us to keep track of the project's growt...

Calculating simple statistics on a single mysql table

I'd like to calculate some simple statistics (percentages) from a table in mysql. The table in question has the following pseudo-schema: TABLE: coupons couponId (int) couponType (int) customerId (int) sentOn (datetime) visitedOn (datetime) purchasedOn (datetime) This table tracks unique coupons sent to customers, when they were sent, ...

Howto Superimpose Multiple Density Curves Into One Plot in R

I have a data that looks like this. And I intend to create multiple density curve into one plot, where each curve correspond to the unique ID. I tried to use "sm" package, with this code, but without success. library(sm) dat <- read.table("mydat.txt"); plotfn <- ("~/Desktop/flowgram_superimposed.pdf"); pdf(plotfn); sm.density.compare...

How do I apply underlying decision rules created from the R package randomForest onto a NEW Out of Bag test set?

Is this even possible? I had a dataset for training that included about 1500 entries. The randomForest created its decision rules and applied them to the randomly chosen (from the original dataset) Out of Bag training sample (bootstrapped 10,000 times). I have a separate (unclassified) dataset that I would like to apply the 10,000 cre...

How can I compute the probability at a point given a normal distribution in Perl?

Is there a package in Perl that allows you to compute the height of probability distribution at each given point. For example this can be done in R this way: > dnorm(0, mean=4,sd=10) > 0.03682701 Namely the probability of point x=0 falls into a normal distribution, with mean=4 and sd=10, is 0.0368. I looked at Statistics::Distribution...

How to fit a random effects model with Subject as random in R?

Given data of the following form myDat = structure(list(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13, 2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07, 1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L), Condition = c(0L, ...

excel with an SAS engine

is it possible to run excel with an SAS engine and run SAS code on it? the purpose of this i want to learn SAS a little bit and i dont want to buy it, so maybe i could use it through excel? ...

Benchmarking: When can I stop making measurements?

I have a series of functions that are all designed to do the same thing. The same inputs produce the same outputs, but the time that it takes to do them varies by function. I want to determine which one is 'fastest', and I want to have some confidence that my measurement is 'statistically significant'. Perusing Wikipedia and the inte...

language-agnostic

gls() vs. lme() in the nlme package

In the nlme package there are two functions for fitting linear models (lme and gls). What are the differences between them in terms of the types of models that can be fit, and the fitting process? What is the design rational for having two functions to fit linear mixed models where most other systems (e.g. SAS SPSS) only have one? ...

3D Least Squares Plane

What's the algorithm for computing a least squares plane in (x, y, z) space, given a set of 3D data points? In other words, if I had a bunch of points like (1, 2, 3), (4, 5, 6), (7, 8, 9), etc., how would one go about calculating the best fit plane f(x, y) = ax + by + c? What's the algorithm for getting a, b, and c out of a set of 3D poi...

Calculate within and between variances and confidence intervals in R

Hello, I need to calculate the within and between run variances from some data as part of developing a new analytical chemistry method. I also need confidence intervals from this data using the R language I assume I need to use anova or something ? My data is like > variance Run Rep Value 1 1 1 9.85 2 1 2 9.95 3 1 ...

3-way CROSSTABS in SPSS

I have some data in SPSS that I would like to format in a particular way, but I can't seem to find a way to do it in the documentation. I have data that consists of 10 question responses, Q1 to Q10, with a Q1 to Q10 for each value of a variable SPEAKER within a variable SESSION. For example, each session can have up to five speakers, f...

1
...
12
13
14
15
16
...
43