statistics

what are the efficient and accurate algorithms to exclude outliers from a set of data?

I have set of 200 data rows(implies a small set of data).I want to carry out some statistical analysis,before that i want to exclude outliers.What are the potential algos for the purpose,Accuracy is a matter of concern. I am very new to Stats,so need help in very basic algos. ...

What are the best cursor (mouse) tracking applications for web sites?

What are the best cursor (mouse) tracking Javascript applications for web sites? To be stored in a database... ...

How to use boost normal distribution classes?

Hi all, I'm trying to use boost::normal_distribution in order to generate a normal distribution with mean 0 and sigma 1. The following code doesn't work as some values are over or beyond -1 and 1 (and shouldn't be). Could someont point out what I am doing wrong? #include <boost/random.hpp> #include <boost/random/normal_distribution.hpp...

Generating means from a bivariate gaussian distribution

I am reading Elements of Statistical Learning ESLII and in chapter 2, they have a gaussian mixture data set to illustrate some learning algorithms. To generate this data set, they first generate 10 means from a bivariate gaussian distribution N((1,0)', I). I am not sure what they mean? How can you generate 10 means from a bivariate dist...

How do I use chi square distribution with C++ Boost library?

I've checked the examples in the Boost website, but they are not what I'm looking for. To put it simple, I want to see if a number on a die is favored, using 600 rolls, so the average appearances of every number (1 through 6) should be 100. And I want to use the chi square distribution to check if the die is fair. Help!, how would I d...

Is it possible to access iPhone/iPod Touch music stats through the SDK?

I've seen it's possible to access musics and playlists and even play them. But is it possible to access the statistics attached to each music? Like play count, stars, dates and times of listening? ...

Using Excel to display the number of occurrences within a date range

I've got a list of transaction dates and the user id of the person who made the transaction on that date (just 1 Tx/day allowed). For example: I'd like to create a matrix which shows, as of each date, the number of users who have made 1 transaction, 2-10 transactions, 10-20 transactions, etc. For example (note, the below data doesn'...

How to create a query for monthly total in MySQL?

I have the following DB. CREATE TABLE IF NOT EXISTS `omc_order` ( `order_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `customer_id` int(10) unsigned NOT NULL, `total` decimal(10,2) NOT NULL, `order_date` datetime NOT NULL, `delivery_date` datetime NOT NULL, `payment_date` datetime NOT NULL, PRIMARY KEY (`order_id`) ) ENGI...

Evaluating the distribution of words in a grid

I'm creating a word search and am trying to calculate quality of the generated puzzles by verifying the word set is "distributed evenly" throughout the grid. For example placing each word consecutively, filling them up row-wise is not particularly interesting because there will be clusters and the user will quickly notice a pattern. ...

Has anyone implemented the chain ladder method via SQL ?

Hello All, Has anyone implemented the chain ladder method via SQL ? It's a method used in actuarial science. ...

What is the equivalents of matlab's pcolor in R?

I have a 16x16 matrix of grayscale values representing handwriting digits. Is there a plot in R that I can use to visualize it? Matlab has pcolor, I am looking for something along those lines. pcolor ...

C# SqlDataReader Execution Statistics and Information

Hi, I am creating an automated DB Query Execution Queue, which essentially means I am creating a Queue of SQL Queries, that are executed one by one. Queries are executed using code similar to the following: using (SqlConnection cn = new SqlConnection(ConfigurationManager.ConnectionStrings["NorthwindConnectionString"].ConnectionString)...

how to rank gene using information gain??

how gene ranking is done for microarray data using information gain and chi-square statistics ?? Please illustrate with a simple example.. ...

What is the Entropy of Android's Dot Password System?

How many permutations of the androids dot login system are possible? I know for a fact the solution to this lies in Discrete Math, specifically Permutations Without Repetition, If your answer doesn't use permutations or combinations you are incorrect. The length of passwords is between 4 and 9 dots, There are a total of 9 dots to p...

R - convert table into matrix by column names

I have data frame that looks like the following models cores time 1 4 1 0.000365 2 4 2 0.000259 3 4 3 0.000239 4 4 4 0.000220 5 8 1 0.000259 6 8 2 0.000249 7 8 3 0.000251 8 8 4 0.000258 ... etc I would like to convert it into a table/matrix wit...

How can I generate random samples from bivariate normal and student T distibutions in C++?

Hi, what is the best approach to generate random samples from bivariate normal and student T distributions? In both cases sigma is one, mean 0 - so the only parameter I am really interested in is correlation (and degrees of freedom for student t). I need to have the solution in C++, so I can't unfortunately use already implemented funct...

Select random k elements from a list whose elements have weights

Selecting without any weights (equal probabilities) is beautifully described here. I was wondering if there is a way to convert this approach to a weighted one. I am also interested in other approaches as well. Update: Sampling without replacement ...

What statistics concepts are useful for profiling?

I've been meaning to do a little bit of brushing up on my knowledge of statistics. One area where it seems like statistics would be helpful is in profiling code. I say this because it seems like profiling almost always involves me trying to pull some information from a large amount of data. Are there any subjects in statistics that I ...

What does Simpson's paradox imply in AB testing?

I am doing A/B testing and I am facing Simpson's paradox in my results (day vs month vs total duration of the test). Does it mean that my a/b testing is not correct/representative? (Some external factor impacted the testing?) If it is a sign of problem, what are the directions to follow? Thanks for your great help. Further reading: ...

Determining if the difference between two error values is significant

I'm evaluating a number of different algorithms whose job is to predict the probability of an event occurring. I am testing the algorithms on large-ish datasets. I measure their effectiveness using "Root Mean Squared Error", which is the square root of the ((sum of the errors) squared). The error is the difference between the predicte...