questions about statistics | ansaurus

statistics

How to Clear down Query Execution Statistics in SQL Server 2005/2008

Based on getting Query Execution Statistics using this extremely useful piece of SQL obtained from this post Most Executed Stored Procedure - Stack Overflow: SELECT TOP 100 qt.TEXT AS 'SP Name', SUBSTRING(qt.text, qs.statement_start_offset/2, CASE WHEN (qs.statement_end_offset = -1) THEN LEN(qt.text) ELSE (qs.statement_end_offset ...

query-execution-plans

Evaluating variable within R loop

I'm trying to iteratively generate some functions using a For Loop: # Create a list to hold the functions funcs <- list() funcs[] # loop through to define functions for(i in 1:21){ # Make function name funcName <- paste( 'func', i, sep = '' ) # make function func = function(x){x * i} funcs[[funcName]] = func ...

Is there a way to remove the border of the legend in ggplot2?

I'm using qplot to plot a function and I want to position the legend within the plot. I've used opts( legend.position = c(0.7,0.7) ) to move the legend where I want it to be. However there is a white border around the legend and that shows up on the gray background. For example: library(ggplot2) x = c(1:20) y = c(1:20) p <- qplo...

Tracking two different user types with Google Analytics?

We've got a site with two types of users: Guests Registered users What we are looking for is a method to track both types of users within just one Google Analytics profile. We believe a registered user stays more in the site and has a higher page view count that a guest. Could this be possible within just one profile? Could there be...

google-analytics

Datasets for Running Statistical Analysis on

What datasets exist out on the internet that I can run statistical analysis on? ...

Using ggplot2 how can I represent a dot and a line in the legend

Using ggplot2 I am plotting several functions and a series of points. I cannot figure out how to represent the points on the legend. I realize I need to use an aes() function, but I don't fully understand how to do this. I apologize that the example is so long, but I don't know how else to illustrate it. ## add ggplot2 library(ggplot2) ...

Mysql, reshape data from long / tall to wide

I have data in a mysql table in long / tall format (described below) and want to convert it to wide format. Can I do this using just sql? Easiest to explain with an example. Suppose you have information on (country, key, value) for M countries, N keys (e.g. keys can be income, political leader, area, continent, etc.) Long format has 3 ...

Need a fast Java beta distribution random number generator

I need to generate random numbers that have a beta distribution in some speed critical code. Currently I'm using the BetaRandomVariable() class from the numerics4j library - but currently represents about 95% of my code's CPU usage! Can anyone recommend a faster way to generate these random numbers? ...

random-number-generator

beta-distribution

What's the typical size of a SaaS or Web2.0 DB?

I would be grateful if you can estimate (based on your experience/knowledge) the typical size of the main database of: A SaaS web site A Web2.0 web site Of course this varies by application type, architecture & user-base, but any average estimation would be very helpful. Thanks! ...

Ready implementation of multivariate Spearman rank correlation

I'm looking for a way to calculate multivariate version of Spearman rank correlation $\rho$. Are there any ready to use Python implementation I can use? ...

How should I analyze web traffic in a statistically correct way?

I have a file with a sequence of event timestamps corresponding to the times at which someone visits a website: 02.02.2010 09:00:00 02.02.2010 09:00:00 02.02.2010 09:00:00 02.02.2010 09:00:01 02.02.2010 09:00:03 02.02.2010 09:00:05 02.02.2010 09:00:06 02.02.2010 09:00:06 02.02.2010 09:00:09 02.02.2010 09:00:11 02.02.2010 09:00:11 02.02....

Random sample from given bivariate discrete distribution

Suppose I have a bivariate discrete distribution, i.e. a table of probability values P(X=i,Y=j), for i=1,...n and j=1,...m. How do I generate a random sample (X_k,Y_k), k=1,...N from such distribution? Maybe there is a ready R function like: sample(100,prob=biprob) where biprob is 2 dimensional matrix? One intuitive way to sample is...

How do you combine "Revision Control" with "WorkFlow" for R?

Hello all, I remember coming across R users writing that they use "Revision control" (e.g: "Source control"), and I am curious to know: How do you combine "Revision control" with your statistical analysis WorkFlow? Two (very) interesting discussions talk about how to deal with the WorkFlow. But neither of them refer to the revision con...

version-control

How does software development compare with statistical programming/analysis ?

Statistical analysis/programming, is writing code. Whether for descriptive or inferential, You write code to: import data, to clean it, to analyse it and to compile a report. Analyzing the data can involve many twists and turns of statistical procedures, and angles from which you look at your data. At the end, you have many files, with ...

software-engineering

software-development

Plotting axes with different scales for one data set in R

I have a large data set I am plotting in R, and I'd like to have an axis on each side of the graph show the data in two different scales. So for example, on the left vertical axis I'd like to plot the data directly (e.g. plot(y ~ x) ) and on the right axis, I'd like to have a linear scaling of the left axis. (e.g. plot( y*20 ~ x). So t...

How to collect statistics from a bittorrent swarm?

I want to collect statistics from the spreading of a file in a new bittorrent swarm without actually downloading anything (or as little as possible). I need to know which peer has which pieces (to make file based statistics) knowing the number of seeders and leechers or percentages is not enough. Later when there are many peers I need to...

the best way to store my statistics (ruby)

I want to display some statistics of data stored in array of arrays. I have three categories (video,article,webinar) but it could expand later on. The structure of statistics per category will be almost the same. Like: total number of videos, last date when added new record in category, etc. So far I can think of a hash of an array to s...

data-structures

How can I generate conditional distributions of data by taking slices of scatterplots?

I'm taking my first course in multiple linear regression, so I'm still a beginner in R. We've recently learned a bit about taking slices of bivariate scatterplot data, both horizontally and vertically. What I'd like to know is how to go beyond a basic scatterplot, and take advantage of conditionally grouping data by slices to examin...

SQL statistics system table

I have created a statistics in a table using SQL SERVER 2008 . In which system table this information is getting stored (Just like all the table info are stored in in sys.tables)...? ...

sql-server-2008

How to calculate the statistics "t-test" with numpy

I'm looking to generate some statistics about a model I created in python. I'd like to generate the t-test on it, but was wondering if there was an easy way to do this with numpy/scipy. Are there any good explanations around? For example, I have three related datasets that look like this: [55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 6...

1
...
22
23
24
25
26
...
43