statistics

When calculating trends, how do you account for low sample size?

I'm doing some work processing some statistics for home approvals in a given month. I'd like to be able to show trends - that is, which areas have seen a large relative increase or decrease since the last month(s). My first naive approach was to just calculate the percentage change between two months, but that has problems when the data...

Visual Studio Web Project Statistics

I'm looking for something that can spit out statistics on Visual Studio 2008 web projects, both Forms and MVC. Things like: Number of pages per project Number of user controls Number of classes Number of methods Number of images/CSS File creation dates If the information exists already in Visual Studio, I can't find it. I've also tr...

Enterprise Platform Statistics

Does anybody know of anywhere where one might discover the relative levels of use of different enterprise platforms? eg. percentage of J2EE vs. Spring vs. .NET vs. the various other (sometimes more obscure) platforms. I have seen lots of comparisons of Java vs. C# and so on, but I am not interested in the Desktop or Web side, I am talki...

How to use Predict.lm in r to reverse the regression

Hello. I have some data in a dataframe calvarbyruno.1 with variables Nominal and PAR that represent the Peak Area Ratio (PAR) found from analysis of a set of standards using a particular analytical technique, and two lm models of that data (linear and quadratic) for the relationship PAR ~ Nominal. I'm trying to use the predict.lm funct...

I cant see statistics information for my auto-generated stats - is this normal? (SQL2005)

I'm trying to diagnose a slow stored procedure (see this question) and I've noticed that for my auto-generated stats (the ones named things like _WA_Sys_0000000A_0D0FEE32) I cant view the detailed histogram. If I click on the "Details" tab I just get the message: No statistics information available. If I click on the details tab for a...

Merging two statistical result sets

I have two sets of statistics generated from processing. The data from the processing can be a large amount of results so I would rather not have to store all of the data to recalculate the additional data later on. Say I have two sets of statistics that describe two different sessions of runs over a process. Each set contains Stati...

label of log y-axis: 1000 instead of 1e+03?

I've a problem concerning construction of log y-axis in a graphic. How can I manage that the units/numbers of my log y-axis aren't shown in 1e+03, 1e+04, 1e+05 etc...., but only in regluar arabic numbers (1000, 10000, 100000)? Thanks. ...

Which Git commit stats are easy to pull

Previously I have enjoyed TortoiseSvn's ability to generate simple commit stats for a given SVN repository. I wonder what is available in Git and am particularly interested in : Number of commits per user Number of lines changed per user activity over time (for instance aggregated weekly changes) Any ideas? ...

How can I take multiple vectors and recode their datatypes in R?

I'm looking for an elegant way to change multiple vectors' datatypes in R. I'm working with an educational dataset: 426 students' answers to eight multiple choice questions (1 = correct, 0 = incorrect), plus a column indicating which instructor (1, 2, or 3) taught their course. As it stands, my data is sitting pretty in data.df, like th...

How can I calculate the median and standard deviation of a bunch stream of numbers in Perl?

Hi everyone, In our logfiles we store response times for the requests. What's the most efficient way to calculate the median response time, the "75/90/95% of requests were served in less than N time" numbers etc? (I guess a variation of my question is: What's the best way to calculate the median and standard deviation of a bunch stre...

Finding similarities in a multidimensional array

Consider a sales department that sets a sales goal for each day. The total goal isn't important, but the overage or underage is. For example, if Monday of week 1 has a goal of 50 and we sell 60, that day gets a score of +10. On Tuesday, our goal is 48 and we sell 46 for a score of -2. At the end of the week, we score the week like this: ...

How to count number of Numeric values in a column

I have a dataframe, and I want to produce a table of summary statistics including number of valid numeric values, mean and sd by group for each of three columns. I can't seem to find any function to count the number of numeric values in R. I can use length() which tells me how many values there are, and I can use colSums(is.na(x)) to c...

Help me make conditionally grouped histograms from my dataset

My current dataset data.df comes from about 420 students who took an 8-question survey under one of 3 instructors. escore is my outcome variable of interest. 'data.frame': 426 obs. of 10 variables: $ ques01: int 1 1 1 1 1 1 0 0 0 1 ... $ ques02: int 0 0 1 1 1 1 1 1 1 1 ... $ ques03: int 0 0 1 1 0 0 1 1 0 1 ... ...

Non-Uniform Random Number Generator Implementation?

I need a random number generator that picks numbers over a specified range with a programmable mean. For example, I need to pick numbers between 2 and 14 and I need the average of the random numbers to be 5. I use random number generators a lot. Usually I just need a uniform distribution. I don't even know what to call this type of di...

Clever way to estimate URL clicks per hour without logging every click ?

I have a site with millions of URLs. Each time a URL is clicked, a database row corresponding to that URL is updated indicating the timestamp of that click. I would like to, using additional columns for sure, but without the need to insert distinct rows for every click, estimate the number of clicks per hour this URL receives. Some...

Taking a Conditional Mean in STATA

Let's say I have a STATA dataset that has two variables: type and price. The type value for each observation is a number between 1 and 10. I want to add a third value that is the average price of all variables of that type. So, for example, if the first observation had a type of 3 and a price of 10, then I'd like to add a third valu...

Calculating Moving Range in SQL Server (without arrays)

Hi, I have a requirement to calculate the Moving Range of a load of data (at least I think this is what it is called) in SQL Server. This would be easy if I could use arrays, but I understand this is not possible for MS SQL, so wonder if anyone had a suggestion. To give you an idea of what I need: Lets say I have the following in a sql...

What is a good statistical math package for .Net?

I am looking for a library that does advanced math, statistics, statistical distribution, etc.. Currently I am looking for something that does binomial and poisson distribution. ...

Programming Ratios

Hi. Has anybody seen studies of ratios of maintainance programming to new development? Thanks. ...

What statistics can be maintained for a set of numerical data without iterating?

Update Just for future reference, I'm going to list all of the statistics that I'm aware of that can be maintained in a rolling collection, recalculated as an O(1) operation on every addition/removal (this is really how I should've worded the question from the beginning): Obvious Count Sum Mean Max* Min* Median** Less Obvious Var...