statistics

Root mean square deviation on binned GAM results using R

Background A PostgreSQL database uses PL/R to call R functions. An R call to calculate Spearman's correlation looks as follows: cor( rank(x), rank(y) ) Also in R, a naïve calculation of a fitted generalized additive model (GAM): data.frame( x, fitted( gam( y ~ s(x) ) ) ) Here x represents the years from 1900 to 2009 and y is the a...

Eliminating outliers by standard deviation in SQL Server

I am trying to eliminate outliers in SQL Server 2008 by standard deviation. I would like only records that contain a value in a specific column within +/- 1 standard deviation of that column's mean. How can I accomplish this? ...

how to track sells you make by visitors reffered by another website ?

Hello, In my future website, i'll have parteners and i want them to receive a certain percentage of money according to how much the visitors they sent me bought items. So this is a simple question : How to know where the visitor comes from when he decide to purchase a thing on my website so i give the correct amount to the partener wh...

Need some help calculating percentile

An rpc server is given which receives millions of requests a day. Each request i takes processing time Ti to get processed. We want to find the 65th percentile processing time (when processing times are sorted according to their values in increasing order) at any moment. We cannot store processing times of all the requests of the past as...

Excel Formula: Calculate Survey Statistics

I have a survey from 100 users and I'm trying to calculate some statistics. The relevant fields in my survey look something like this: Gender Interests B1: Male D1: Running, Snowboarding, Mountain Bikes B2: Male D2: Programming, Running, Paintball B3: Female D3: Bowling, Gymnastics B4: Male D...

how to integrate / link R and Computer Algebra Systems (CAS)

I'm looking for a possibility to use different 'higher' math operations in combination with R. A link or integration between R and a CAS would be the perfect solution. Which integration of R and other (math & statistic related) systems or vice verse are out there? How well do they work? What would you suggest? How expansive (in time, mo...

am i looking for average here

Hi All, I have a note field that I'm trying to determine a cut off length to display. I have some numbers Note Length and # of Notes with that length How do I come up with a good average? Do I need more information? Thanks, rod. ...

Following a Dynamic Score

I have little to no formal discrete math training, and have run into a wee bit of an issue. I am trying to write an agent which reads in a human player's (arbitrary) score and scores a point every so often. The agent needs to "lag behind" and "catch up" every so often, so that the human player believes there is some competition going on....

R: Forecast package: Automatic algorithm for composite model involving ETS and AR

Hey, I would like to write a code involving automatic selection of a best composite model using ETS as well as autoregressive models. What is the criteria I should base my selection on? Also if I'm using the auto.arima function for deducing number of AR terms and corresponding coefficients from the forecast package in R, does my input ...

c# standard deviation of generic list?

Hello. I need to calculate the standard deviation of a generic list. I will try to include my code. Its a generic list with data in it. The data is mostly floats and ints. Here is my code that is relative to it without getting into to much detail: namespace ValveTesterInterface { public class ValveDataResults { private ...

calculate the standard deviation of a generic list of objects

I'm a c# noob but I really need a professional's help. I am using visual studio 2005 for a project so I don't have math.linq I need to calculate the standard deviation of a generic list of objects. The list contains just a list of float numbers, nothing too complicated. However I have never done this before so i need someone to show me t...

Trying to use Cumulative Distribution Function in GSL

Hey guys, I'm trying to compute the cumulative distribution function of the standard normal distribution for a formula in C using the GSL (Gnu Statistics Library) I've installed and included gsl but am having trouble understanding how to use it. I think the function I need is: double gsl_ran_lognormal (const gsl_rng * r, double zeta, ...

Create a summary description of a schedule given a list of shifts

Assuming I have a list of shifts for an event (in the format start date/time, end date/time) - is there some sort of algorithm I could use to create a generalized summary of the schedule? It is quite common for most of the shifts to fall into some sort of common recurrence pattern (ie. Mondays from 9:00 am to 1:00 pm, Tuesdays from 10:00...

numpy convert categorical string arrays to an integer array

I'm trying to convert a string array of categorical variables to an integer array of categorical variables. Ex. import numpy as np a = np.array( ['a', 'b', 'c', 'a', 'b', 'c']) print a.dtype >>> |S1 b = np.unique(a) print b >>> ['a' 'b' 'c'] c = a.desired_function(b) print c, c.dtype >>> [1,2,3,1,2,3] int32 I realize this can be d...

Fair matchmaking for online games.

Most online games arbitrarily form teams. Often times its up to the user, and they'll choose a fast server with a free slot. This behavior produces unfair teams and people rage quit. By tracking a player's statics (or any statics that can be gathered) how can you choose teams that are as fair as possible? ...

blocking login after X failed attempts

I'm trying to block login for x minutes after y failed attempts. I'm already planning to log user logins, so I guess I could use the same database to calculate if blocking needs to happen. My questions: does it make sense to use the same logs table to run the logic of the y failed attempts blocking? Some people have a table just for t...

Looking for an estimation method (data analysis)

Hi! Since I have no idea about what I am doing right now, my wording may sound funny. But seriously, I need to learn. The problem I'm facing is to come up with a method (model) to estimate how a software program works: namely running time and maximal memory usage. What I already have are a large amount of data. This data set gives an o...

Can I log users logins when using openID

I'm setting up a login system for a site and someone suggested using openID instead. In my current setup, I log users' login attempts into a db table. When using openID, would I still be able to have that fine-grained control or not? ...

How To Create Vector of Vector In R

I have input data that contain lines like this: -0.438185 -0.766791 0.695282 0.759100 0.034400 0.524807 How can I create a data structure in R that looks like this: [[1]] [1] -0.438185 -0.766791 0.695282 [[2]] [1] 0.759100 0.034400 0.524807 ...

How can I count the number of times a value occurs in a column of a dataframe?

Is there a simple way of identifying the number of times a value is in a vector or column of dataframe? I essentially want the numerical values of a histogram but I do not know how to access it. # sample vector a <- c(1,2,1,1,1,3,1,2,3,3) #hist hist(a) Thank you. UPDATE: On Dirk's suggestion I am using hist. Is there a better way t...