statistics

How do I measure variability of a benchmark comprised of many sub-benchmarks?

(Not strictly programming, but a question that programmers need answered.) I have a benchmark, X, which is made up of a lot of sub-benchmarks x1..xn. Its quite a noisy test, with the results being quite variable. To accurately benchmark, I must reduce that "variability", which requires that I first measure the variability. I can easily...

Django template generation time

How can I measure time taken for template generation? I know i can collect time Context.render() takes but can i do it unobtrusive way? Something like Page Stats Middleware do for Python and DB time... But split Python time to code/view time and template time? ...

How to combine False positives and false negatives into one single measure

I'am trying to measure the performance of a computer vision program that tries to detect objects in video. I have 3 different versions of the program which have different parameters. I've benchmarked each of this versions and got 3 pairs of (False positives percent, False negative percent). Now i want to compare the versions with each...

Difference between Ad company statistics, Google Analytics and Awstats on adult sites

Hi, I have this problem. I have web page with adult content and for several past months i had PPC advertisement on it. And I've noticed a big difference between Ad company statistics of my page, Google Analytics data and Awstats data on my server. For example, Ad company tells me, that i have 10K pageviews per day, Google Analytics tell...

Tracking Android Market statistics

Hi, The developer console provides basic install statistics but does not provide any historical data or graphs. Many sites (androidzoom.com and many more) provide a view of the Market but they seem to be very out of date and generally provide much less granular data. Is there any way to access the data Google already has on my applicat...

How do you separate visitors to two intro pages to compare amt of registration per visitor?

Hi. I am making a site that depend on users to register to be able to play my online game. If they don't login they only get to see the intro page. If the user login to the site I will save this information to the cookie so next time they visit they will be sent directly to the login.php. If the user don't have this information in the c...

How to determine statistical significance using T-Test in Excel?

Hello, I have two groups of data sets, A and B. I would like to know weither the average value of A significantly differs then B's average. How to do that in Excel 2007? (I know there's a TTEST formula in excel, I also know I don't need to use the paired version of it, what other parameters do I need to set and how to interpert the re...

How to approach this algorithm question?

A website has a database of n questions. You click a button and are shown one random question per click. The probability of a particular question showing up at the click event is 1/n. On average, how many clicks would be required to see all the questions in the database? What is the approach required for such questions? ...

Feature selection using Gram-Schmidt orthogonalization in R

Is there any package in R that contains algorithm for feature selection using Gram-Schmidt orthogonalization? ...

Model Fit statistics for a Logistic Regression

I'm running a logistic regression model in R. I've used both the Zelig and Car packages. However, I'm wondering if there is a simple way to get the model fit statistics for the model. (pseudo R-square, chi-square, log liklihood,etc) ...

How do you measure the level of improvements your website and dev team are doing?

I am looking for ways to measure and prove that our team is improving, but I can't just blankly state that, I need ways to prove it. For example we are using coldfusion 8 and sql server 2005, and I can easily prove that the number of error's each day, week is getting less and less. But what other figures, can i use to show what areas a...

Setting up a CSV file for R to display histograms

Greetings, Basically, I have two vectors of data (let's call it experimental and baseline). I want to use the lattice library and histogram functions of R to plot the two histograms side-by-side, just as seen at the end of this page. I have my data in a CSV file like this: Label1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18 Label2,1,2...

Disable GUI, graphics devices in R

Is there an easy way to turn of all GUI elements in R and run it solely from the command line on OSX? I'm trying to replicate the behavior of a remote linux terminal on my OSX machine. Thus plot() should just save a file and things like CRAN mirror selection should be text, not a Tk interface. I'm having trouble finding where to set...

Statistical usage of WPF / Winforms / ClickOnce / ...

I'm after some sort of statistic of usage of WPF in programming application. Is there any reference online or stats, such as the browser usage stats, for WPF vs Winforms, ClickOnce vs MSI, C# vs. VB, etc.? Also is there such thing per year? (ie. tracking the evolution in the usage of WPF) I've googled but no luck so far. Thanks. ...

How many requests per second should my asp(class) app handle

I'm profiling a asp(classic) web service. The web service makes database calls, reads/writes to files, and processes xml. On a windows server 2003 box(2.7ghz, 4 core, 4gb ram) how many requests per second should I be able to handle before things start to fail. I'm building a tool to test this, but I'm looking for a number of requests pe...

Matlab test of independence

For 1,000,000 observations, I observed a discrete event, X, 3 times for the control group and 10 times for the test group. I need to preform a Chi square test of independence in Matlab. This is how you would do it in r: m <- rbind(c(3, 1000000-3), c(10, 1000000-10)) # [,1] [,2] # [1,] 3 999997 # [2,] 10 999990 chisq.test(...

how can I get a complete vector of residuals from an ARX model

I used ARX function then RESID function from the System Identification Toolbox, but the resulting residuals are: 0 0 0 5 6 8 7 8 the number of zeros=the number of lags, I need a complete vector of residuals ...

Have you done freelance data analysis work?

As a recent graduate, I was considering doing some freelance data analysis work. While there are numerous firms offering similar services, I would direct my services towards small "mom and pop" business and the self-employed. Does anyone have experiance with this? What's been your experiance? ...

best wavelet library for R

What is a good library for wavelets in R? ...

Using R to draw a time series with discrete data

Greetings, I have a table that looks like the following: date value 2007-11-05 134 2007-12-08 234 2008-03-10 322 2008-03-11 123 ... In summary, it has daily values for three years, but it doesn't have values for every day. What I need is to draw a line chart plot(data$date, data$value) for the whole time span, but considering t...