statistics

How to calculate amount of contingency tables?

If i want to calculate the amount of k-dimensional contingency tables which formula should I use? For example, if i have 16 categorical variables in my dataset and want to calculate the amount of 1-dimensional contingency tables, then it's clear, there is only 1 table. If I want to calculate the amount of 2-dimensional contingency table...

Computing, storing, and retrieving values to and from an N-Dimensional matrix

This question is probably quite different from what you are used to reading here - I hope it can provide a fun challenge. Essentially I have an algorithm that uses 5(or more) variables to compute a single value, called outcome. Now I have to implement this algorithm on an embedded device which has no memory limitations, but has very ha...

How do I determine a best-fit distribution in java?

I have a bunch of sets of data (between 50 to 500 points, each of which can take a positive integral value) and need to determine which distribution best describes them. I have done this manually for several of them, but need to automate this going forward. Some of the sets are completely modal (every datum has the value of 15), some a...

Programmatically determine the relative "popularities" of a list of items (books, songs, movies, etc)

Given a list of (say) songs, what's the best way to determine their relative "popularity"? My first thought is to use Google Trends. This list of songs: Subterranean Homesick Blues Empire State of Mind California Gurls produces the following Google Trends report: (to find out what's popular now, I restricted the report to the last 3...

Excel 2007 MedianIfs()

I want to calculate some statistics. In order to calculate the average of certain values of a column, I use AverageIfs(). Now I want to calculate the median for the same values. But there is no MedianIfs() function. Is there a simple solution to calculate the median for values that hold certain conditions (2 conditions)? ...

how to develop a program to minimize errors in human transcription of hand written surveys

I need to develop custom software to do surveys. Questions may be of multiple choice, or free text in a very few cases. I was asked to design a subsystem to check if there is any error in the manual data entry for the multiple choices part. We're trying to speed up the user data entry process and to minimize human input differences bet...

linking info of pairs of respondents (couples) in SPSS

I am preparing for analyses of the determinants of partner choice in SPSS, but basically I can't get off the ground because I don't know how to create new variables based on the information of each respondent's spouse (i.e. education, wages, social background, ethnicity etc.). Each respondent is currently identified by an ID#, and exis...

Ways to calculate similarity

Hi I am doing a community website that requires me to calculate the similarity between any two users. Each user is described with the following attributes: age, skin type (oily, dry), hair type (long, short, medium), lifestyle (active outdoor lover, TV junky) and others. Can anyone tell me how to go about this problem or point me to s...

What is the best way to store categorical references in SQL tables?

I'm wanting to store a wide array of categorical data in MySQL database tables. Let's say that for instance I want to to information on "widgets" and want to categorize attributes in certain ways, i.e. shape category. For instance, the widgets could be classified as: round, square, triangular, spherical, etc. Should these categories be ...

Statistical analysis for performance measurement on usage of bigger datatype wherever they were not required at all

If i takes larger datatype where i know i should have taken datatype that was sufficient for possible values that i will insert into a table will affect any performance in sql server in terms of speed or any other way. eg. IsActive (0,1,2,3) not more than 3 in any case. I know i must take tinyint but due to some reasons consider...

Load Tracking Script on Form Submit

I'm trying to load a StatCounter tracking script after the form has been successfully sent. Here's the code: $.ajax({ type: "POST", url: "css/sendmail.php", data: dataString, success: function(data) { $('#script').load('css/script.html', function() ({ $('#overlay').css('visibility','hidden'); clearForm(); $('#...

How to notice unusual news activity

Suppose you were able keep track of the news mentions of different entities, like say "Steve Jobs" and "Steve Ballmer". What are ways that could you tell whether the amount of mentions per entity per a given time period was unusual relative to their normal degree of frequency of appearance? I imagine that for a more popular person like...

Statistics: combinations in Python

I need to compute combinatorials (nCr) in Python but cannot find the function to do that in 'math', 'numyp' or 'stat' libraries. Something like a function of the type: comb = calculate_combinations(n, r) I need the number of possible combinations, not the actual combinations, so itertools.combinations does not interest me. Finally, I...

Statistics Question: Kernel Smoothing in R

I have data of this form: x y 1 0.19 2 0.26 3 0.40 4 0.58 5 0.59 6 1.24 7 0.68 8 0.60 9 1.12 10 0.80 11 1.20 12 1.17 13 0.39 I'm currently plotting a kernel-smoothed density estimate of the x versus y using this code: smoothed = ksmooth( d$resi, d$score, bandwidth = 6 ) plot(...

How do you find out release, mailing list statistics information on open source projects

We are interested in finding out some statistics of various frameworks Mailing list activity on say richfaces. Much similar to what is available on http://code.google.com (Low, Medium, High) + average number of emails per day | per month. Number of releases made in a year including patch, minor, major releases. We did look at the mave...

Screening (multi)collinearity in a regression model

I hope that this one is not going to be "ask-and-answer" question... here goes: (multi)collinearity refers to extremely high correlations between predictors in the regression model. How to cure them... well, sometimes you don't need to "cure" collinearity, since it doesn't affect regression model itself, but interpretation of an effect o...

How much time do you spent on what task while working on a project?

How much time do you spend (in percent) in a project with writing actual code? writing unit tests? bug fixing? writing documentation? communicating with the customer? communicating with team members? setting up the project? integrating other parts? reviewing code from other developers? deployment? support? training? ... ...

Why to use local pipes instead of sockets for programs communication inside one computer?

Why to use local pipes instead of sockets for programs communication inside one computer? Has Any one FRESH stats on who is faster and how nmuch, what is more sequre or less? I have found a wary old and strange and scary one... http://home.iae.nl/users/mhx/pipes&socks.html There is a noticeable difference in performance between soc...

about Select algorithm

Hi I have read about the selection algorithm and I have a question maybe it looks silly!!! but why we consider the array as groups of 5 elements ?? can we consider it with 7 or 3 elements??thanks also is there any link to help me for understanding this aim better? also this is my proof when we consider the array with 3 elements and it s...

What's the best way to unit test code that generates random output?

Specifically, I've got a method picks n items from a list in such a way that a% of them meet one criterion, and b% meet a second, and so on. A simplified example would be to pick 5 items where 50% have a given property with the value 'true', and 50% 'false'; 50% of the time the method would return 2 true/3 false, and the other 50%, 3 tru...