statistics

Software industry sector sizes

Are there statistics or even estimates anywhere regarding the sizes of the various sectors of the software industry (e.g. desktop, systems, embedded, business, games etc.)? ...

How to get statistical distributions out of C++ Code?

I want some help in programming a random generator for different types of distribution using C++ language. for the following: Geometric distribution Hypergeometric distribution Weibull distribution Rayleigh distribution Erlang distribution Gamma distribution Poisson distribution Thanks. ...

What are some good ways to store performance statistics in a database for querying later?

Goal: Store arbitrary performance statistics of stuff that you care about (how many customers are currently logged on, how many widgets are being processed, etc.) in a database so that you can understand what how your servers are doing over time. Assumptions: A database is already available, and you already know how to gather the inform...

Is NoSQL ideal to store stats?

I'm not terribly familiar with NoSQL systems, but I remember reading a while back that they are ideal to handle statistical data. Since I'm about to start writing code that will record data like "how many users were registered on each day", I was thinking I could use this as an opportunity to learn more about NoSQL if it fits the bill. ...

Regressing panel data in SAS.

Hey Guys, thanks to your help I succesfully managed all my databases! I am now looking at a panel data set on which I have to regress. Since I only started my Phd this semester together with the econometrics courses I am still new to many statistic applications and regression methods. I want to do a simple regression as in Y = x1 x2 x3...

PHP/MySQL database connection priority?

Hello, I have a production database where usage statistics reside. This database is responsible for many other things (not just statistics calcs). I use php to periodically roll up different resolutions (day, week, month, year) of interesting statistics in buckets dictated by the resolution. The php application I've written "completes"...

Generating lognormally distributed random number from mean, coeff of variation

Most functions for generating lognormally distributed random numbers take the mean and standard deviation of the associated normal distribution as parameters. My problem is that I only know the mean and the coefficient of variation of the lognormal distribution. It is reasonably straight forward to derive the parameters I need for the ...

How can I superimpose modified loess lines on a ggplot2 qplot?

Background Right now, I'm creating a multiple-predictor linear model and generating diagnostic plots to assess regression assumptions. (It's for a multiple regression analysis stats class that I'm loving at the moment :-) My textbook (Cohen, Cohen, West, and Aiken 2003) recommends plotting each predictor against the residuals to make s...

Generation of an array of Random numbers with defined Min, Max, Mean and Stdev with given number of elements and error level

I'd like to generate an array of Random numbers with defined Min, Max, Mean and Stdev with given number of elements and error level. Is there such a library in C, C++, PHP or Python to do so? Please kindly advise. Thanks! ...

Are there any Linear Regression Function in SQL Server?

Are there any Linear Regression Function in SQL Server 2005/2008, similar to the the Linear Regression functions in Oracle ? ...

creating a spreadsheet from an xml file

I am trying to convert a 120mb xml database of terrorist incidents (the first file for download available here http://wits.nctc.gov/Export.do) to spreadsheet form so i can merge it with other data and do statistical analysis. so far I have worked with stata, which is useless now because it wont read xml. the site offers smaller files by...

Examples of simple stats calculation with hadoop

I want to extend an existing clustering algorithm to cope with very large data sets and have redesigned it in such a way that it is now computable with partitions of data, which opens the door to parallel processing. I have been looking at Hadoop and Pig and I figured that a good practical place to start was to compute basic stats on my...

Statistics regarding MonoTouch usage?

Just wondered how many application are written with MonoTouch and published in App Store? Is MonoTouch ready to be used in production? What other statistics do you know regarding this tool? ...

Best Website Statistics tool for Drupal

What is the best free Website statistics setup I can have for Drupal 6 on Apache? Particularities: 1. Multisite install. Might want to look over several sites. Should be able to restrict view for clients to their own site. Some hits are bypassing Drupal. Some urls are not public. Some sites have little traffic, it would be nice to b...

R selecting duplicate rows

Okay, I'm fairly new to R and I've tried to search the documentation for what I need to do but here is the problem. I have a data.frame called heeds.data in the following form (some columns omitted for simplicity) eval.num, eval.count, ... fitness, fitness.mean, green.h.0, green.v.0, offset.0, green.h.1, green.v.1,...green.h.7, green.v....

Weighted Average and Ratings

Maths isn't my strong point and I'm at a loss here. Basically, all I need is a simple formula that will give a weighted rating on a scale of 1 to 5. If there are very few votes, they carry less influence and the rating pressess more towards the average (in this case I want it to be 3, not the average of all other ratings). I've tried a...

Cosmic Rays: what is the probability they will affect a program?

Once again I was in a design review, and encountered the claim that the probability of a particular scenario was "less than the risk of cosmic rays" affecting the program, and it occurred to me that I didn't have the faintest idea what that probability is. "Since 1/2^128 is 1 out of 340282366920938463463374607431768211456, I think we...

Smoothing Small Data Set With Second Order Quadratic Curve

I'm doing some specific signal analysis, and I am in need of a method that would smooth out a given bell-shaped distribution curve. A running average approach isn't producing the results I desire. I want to keep the min/max, and general shape of my fitted curve intact, but resolve the inconsistencies in sampling. In short: if given a se...

Is it possible to do A/B testing by page rather than by individual?

Lets say I have a simple ecommerce site that sells 100 different t-shirt designs. I want to do some a/b testing to optimise my sales. Let's say I want to test two different "buy" buttons. Normally, I would use AB testing to randomly assign each visitor to see button A or button B (and try to ensure that that the user experience is con...

What's the most comprehensive and comprehensible overview of statistics for programmers?

I'm looking for a book (or other media) which provides an overview of statistics that is both comprehensive (covering all the basic/intermediate concepts) and comprehensible (which, for me, means not being weighed down with unnecessary and especially un-introduced mathematical symbology). Can anyone offer suggestions? ...