statistics

What Statistics Framework / Tools use with data from different sources

I got many datasources (postgres database, logfiles) containing statistic data or containing data from which you can calculate statistic data. Im searching for a application where you can design new reports / outputs (graphs, tables etc) without using a programing language but using a gui. You should be able to save these queries and r...

Using Awstats for generating usage statistics for a Liferay portal

Has anyone tried to use Awstats for generating usage statistics for a Liferay portal? Can you share your experience on how to do it? Aside from Awstats and Google Analytics, are there any other alternatives for generating statistics for a Liferay portal? (I can't use Google Analystics since it's a restricted internal portal) ...

Handling Many Statistic DB Columns With Order By Requirements

For my current project we want to present statistical data and rank it. For my case I'm talking about "Favouriting" of an artist, counting the times an artist's track has been played, displaying a count of how many playlists an artists track has been added to a playlist... These are all very domain specific issues, but it's a concrete ex...

How popular is Groovy/Grails in the corporate world?

Are there any figures for its adoption in corporate environments? Does anyone know of large corporations that have adopted it for projects? ...

Simple multidimensional curve fitting

I have a bunch of data, generally in the form a, b, c, ..., y where y = f(a, b, c...) Most of them are three and four variables, and have 10k - 10M records. My general assumption is that they are algebraic in nature, something like: y = P1 a^E1 + P2 b^E2 + P3 c^E3 Unfortunately, my last statistical analysis class was 20 years ago....

What's the probability that X *consecutive* bits in an array of N bits is set to 1?

I'm trying to code a simple, sufficiently accurate filter for validating a piece of hardware in an RTL simulation. We're simulating the randomness inherent in a chip's flip-flops, by randomly initializing all the flip-flops in the design to either 0 or 1. This corresponds to the chip's flip-flops getting some random value during power-...

What are some economically important applications of machine learning?

Apologies in advance if this is too vague. My list so far: statistical arbitrage actuarial science manufacturing process control image processing (security, manufacturing, medical imaging) computational biology/drug design sabermetrics yield management operations research/logistics (I'll include business intelligence wit...

Of all the commercial projects you've personally worked on, how many have been canned/succeeded?

I don't mean personal projects. I mean projects where there were a number of people involved and business dollars were spent. ...

Sample the average of values received in last X seconds

I have a class that dispatches a Success and a Failure event and I need to maintain a statistic on the average number of failure/total number of events in the last X seconds from that class. I was thinking something along the lines of using a circular linked list and append a success or failure node for each event. Then count the numbe...

SQL commands to get performance statistics

Are there SQL commands that I could use to extract performance monitoring data from MS SQL 2005, such as: transactions per second page reads/writes connections (@@CONNECTIONS gives the total, but what about current) physical reads locks and blocks other counters that might be interesting? ...

How can I sort the X axis in a Barplot in R?

Hi, I have binned data that looks like this: (8.048,18.05] (-21.95,-11.95] (-31.95,-21.95] (18.05,28.05] (-41.95,-31.95] 81 76 18 18 12 (-132,-122] (-122,-112] (-112,-102] (-162,-152] (-102,-91.95] 6 6 6 ...

Using Datamining/Statistics for Log Monitoring

I have a large set of log files that I want to characterize or possibly add some kind of decision tree or some kind of analytics. But I don't know exactly what. What kind of analysis have you done with log files, a lot of log files. For example, so far I am collecting how many requests are made to a particular page for a given log fil...

Learning Applied Statistics with a focus on R

I know MIT and Stanford have placed many videos online of their courses. Does anybody know of a course (with videos available online) of Applied Statistics? I've been playing with R and the tool (from a technical side) is pretty straightforward. However, I'm quite clueless when it comes to the statistical side (regressions, recursive p...

Finding mean of array of ints

Say you have an array of int (in any language with fixed size ints). How would you calculate the int closest to their mean? Edit: to be clear, the result does not have to be present in the array. That is, for the input array [3, 6, 7] the expected result is 5. Also I guess we need to specify a particular rounding direction, so say ro...

What are the best Javascript/Flash frameworks to render graphs or charts from data?

Ideally I'd like to do as little preparation data work on the server as possible. The less I have to do to prep the data from the database to make a given chart, the happier I am and the more view I can make in the time. Some of the things I'd like to chart are, for example: The distribution of a series of response times The number of...

Good programming/math literature

I recently read The Numerati by Stephen Baker. It is an AMAZING book which really opens your eyes to all the possibilities of new emerging technology. I was wondering if anyone (preferably who has read The Numerati) could suggest a good read? I'm not looking for anything "code related". Thanks! ...

How to compute the p-value in hypothesis testing (linear regression)

Currently I'm working on an awk script to do some statistical analysis on measurement data. I'm using linear regression to get parameter estimates, standard errors etc. and would also like to compute the p-value for a null-hypothesis test (t-test). This is my script so far, any idea how to compute the p-value? BEGIN { ybar = 0.0 ...

information criteria for confusion matrices

One can measure goodness of fit of a statistical model using Akaike Information Criterion (AIC), which accounts for goodness of fit and for the number of parameters that were used for model creation. AIC involves calculation of maximized value of likelihood function for that model (L). How can one compute L, given prediction results of ...

Are there any statistics on ORM usage / popularity?

I'm looking for some statistics on the usage/popularity/availability/etc of object relational mapping frameworks (ORMs). An example might be how the number of downloads of a specific ORM has changed over time. Does anyone know of any such statistics? Edit: A little clarification: The reason I want the statistics is to be able to back ...

How do applications collect statistics?

I need to collect statistics from my server application written in python. I am looking for some general guidance on how to setup models and exactly how to store the statistics information. I was thinking of storing and organizing all this information in a database, but my implementation is turning out to be too specific. I need to co...