statistics

Geometric Mean: is there a built-in?

i tried to find a built-in for geometric mean but couldn't. (Obviously a built-in isn't going to save me any time while working in the shell, nor do i suspect there's any difference in accuracy; for scripts i try to use built-ins as often as possible, where the (cumulative) performance gain is often noticeable. In case there isn't one ...

Privileges for a Oracle 9i statistics job

We want to move our automated statistics gathering from an external script into Oracle 9i's job scheduler. It's a very simple job, and the code basically looks like this: DBMS_JOB.SUBMIT( JOB => <output variable>, WHAT => 'DBMS_STATS.GATHER_DATABASE_STATS( cascade => TRUE, options => ''GATHER AUTO'');', NEXT_DA...

mean and variance of image in single pass

hi everyone,am trying to calculate mean and variance using 3X3 window over image(hXw) in opencv...here is my code...is there any accuracy issues with this??or is there any other efficient method to do it in one pass.? int pi,a,b; for(i=1;i<h-1;i++) { for(j=1;j<w-1;j++) { int sq=0,sum=0; double mean=0; double...

R: How to remove outliers from a smoother in ggplot2?

I have the following data set that I am trying to plot with ggplot2, it is a time series of three experiments A1, B1 and C1 and each experiment had three replicates. I am trying to add a stat which detects and removes outliers before returning a smoother (mean and variance?). I have written my own outlier function (not shown) but I e...

How to export Instrument's CPU Monitor's statistics to be usable in Excel or Numbers?

I require statistics of CPU Load of my iPhone apps. I am trying to use Instrument to see the CPU Load, but all I see in Instrument program is rendered graphs. I need these statistics data in raw numbers so that I can put all of them on graphs using Excel or Numbers. Is there a way to export these data in such a way? Or do I need other pr...

naive bayesian spam filter question

Hi guys, I am planning to implement spam filter using Naive Bayesian classification model. Online I see a lot of info on Naive Bayesian classification, but the problem is its a lot of mathematical stuff, than clearly stating how its done. And the problem is I am more of a programmer than a mathematician (yes I had learnt Probability a...

How can I make the output from tapply() into a data.frame

I have a data.frame in R that looks like this: score rms template aln_id description 1 -261.410 4.951 2f22A.pdb 2F22A_1 S_00001_0000002_0 2 -231.987 21.813 1wb9A.pdb 1WB9A_4 S_00002_0000002_0 3 -263.722 4.903 2f22A.pdb 2F22A_3 S_00003_0000002_0 4 -269.681 17.732 1wbbA.pdb 1WBBA_6 S_00004_0000002_0 5 -258.621...

Simple Java library for storing statistical observations and calculating statistics such as stddev, etc.

For logging purposes I want to collect the response times of an external system, and periodically fetch various statistics (such as min/max/stddev) of the response times. I'm looking for a pure in-memory solution. What Java library can help me with this simple task? I'm looking for an API that would ideally look something along the lin...

Graphing perpendicular offsets in a least squares regression plot in R

I'm interested in making a plot with a least squares regression line and line segments connecting the datapoints to the regression line as illustrated here in the graphic called perpendicular offsets: http://mathworld.wolfram.com/LeastSquaresFitting.html I have the plot and regression line done here: ## Dataset from http://www.apsnet....

Summarising grouped records in a dataframe in R

Hello all, I have a data frame in R that looks like this: > TimeOffset, Source, Length > 0 1 1500 > 0.1 1 1000 > 0.2 1 50 > 0.4 2 25 > 0.6 2 3 > 1.1 1 1500 > 1.4 1 18 > 1.6 2 2500 > 1.9 2 ...

Summarising grouped records in a dataframe in R (...again)

Hello all, (I tried to ask this question earlier today, but later realised I over-simplified the question; the answers I received were correct, but I couldn't use them because of my over-simplification of the problem in the original question. Here's my 2nd attempt...) I have a data frame in R that looks like: "Timestamp", "Source", "...

Long running stats process - thoughts on language choice?

I am on a LAMP stack for a website I am managing. There is a need to roll up usage statistics (a variety of things related to our desktop product), and I initially tackled the problem with PHP (being that I had a bunch of classes to work with the data already). All worked well on my dev box which was using 5.3 Long story short, 5.1 memo...

R: Given a set of random numbers drawn from a continuous univariate distribution, find the distribution

Given a set of real numbers drawn from a unknown continuous univariate distribution (let's say is is one of beta, Cauchy, chi-square, exponential, F, gamma, Laplace, log-normal, normal, Pareto, Student's t, uniform and Weibull) .. x <- c(7.7495976,12.1007857,5.8663491,9.9137894,11.3822335,7.4406175,8.6997212,9.4456074,11.8370711,6.42514...

Is there a good R API for accessing Google Docs?

I'm using R for data analysis, and I'm sharing some data with collaborators via Google docs. Is there a simple interface that I can use to access a R data.frame object to and from a Google Docs spreadsheet? If not, is there a similar API in other languages? ...

Statistical analysis on large data set to be published on the web

I have a non-computer related data logger, that collects data from the field. This data is stored as text files, and I manually lump the files together and organize them. The current format is through a csv file per year per logger. Each file is around 4,000,000 lines x 7 loggers x 5 years = a lot of data. some of the data is organized a...

In R: How do I choose the first 4 Rows of a Data Frame

How Do I go about getting the first 4 rows of my Dataframe: Weight Response 1 Control 59 0.0 2 Treatment 90 0.8 3 Treatment 47 0.1 4 Treamment 106 0.1 5 Control 85 0.7 6 Treatment 73 0.6 7 Control 61 0.2 In 'R'? ...

Determining smallest number of samples for 99% accuracy

I'm trying to compare 100,000 records on a local database (L) with 100,000 records on a remote database (R). Basically I want to know if an element in L exists in R. To determine that, I have to make a request against the R for each L, which takes a long time (I know, there should be a better way, there isn't, that's the API I've got)...

In R, How to add Row for Information:

Hi, I'm trying to add a Row to my data.frame to spit out the average of the column. This is my data frame: Weight Response 1 Control 59 0.0 2 Treatment 90 0.8 3 Treatment 47 0.1 4 Treamment 106 0.1 5 Control 85 0.7 6 Treatment 73 0.6 7 Control 61 0.2 I...

need more complex index !

I'm sorry if it's not an appropriate question for this site, and if it's necesary I'll close this question. But maybe someone could give me an ideea: I'm trying to find a more complex index to make an hierarchy. For example: 5 votes from 6 = 83% AND 500 votes from 600 = 83%; 10 votes from 600 = 1.66% If I make a hierarchy with the ...

Fitting Gaussian KDE in numpy/scipy in Python

I am fitting a Gaussian kernel density estimator to a variable that is the difference of two vectors, called "diff", as follows: gaussian_kde_covfact(diff, smoothing_param) -- where gaussian_kde_covfact is defined as: class gaussian_kde_covfact(stats.gaussian_kde): def __init__(self, dataset, covfact = 'scotts'): self.covfac...