questions about statistics | ansaurus

statistics

How to Plot With Different Marker ( 'x' and 'o') Based on Condition in R

I have a data that looks like this: for_y_axis <-c(0.49534,0.80796,0.93970,0.99998) for_x_axis <-c(1,2,3,4) count <-c(0,33,0,4) What I want to do is to plot the graph using for_x_axis and for_y_axis but will mark the point with "o" if the count value is equal to 0(zero) and with "x" if the count value is greater than zero. Is th...

Binary Grouping in R

Suppose I have two vectors of same dimensions: x <-c(0.49534,0.80796,0.93970,0.99998) count <-c(0,33,0,4) How can I group the vectors 'x' into two vectors: Vector grzero that contain value in x with count value greater than 0 and Vector eqzero with value in x with count value equal to zero. Yielding > print(grzero) > [1] ...

How to determine the most frequently visited routes through a web site by visitors?

Hi, I want to be able to discover the 10 most popular routes through our web site that lead a visitor to register with us. I have already logged all of this info, but don't seem to be able to find the best solution to query it. The site is quite high traffic, > 3 million page views per month, so the solution needs to scale. What sugg...

usage-statistics

visitor-statistic

How to use R Random forests to reduce attributes having no discrete classes?

I want to use Random forests for attribute reduction. One problem I have in my data is that I don't have discrete class - only continuous, which indicates how example differs from 'normal'. This class attribute is a kind of distance from zero to infinity. Is there any way to use Random forest for such data? ...

machine-learning

feature-selection

icc's when the number of judges is not constant

Hi, I have the following problem. I need to calculate the Shrout & Fleiss ICC's for the situation in which items are judged by a varying number of judges. For example, the competitive nature of an industry is judged for a set of industries, but with a different number of judges per industry. One industry is only judged by 2 judges, wher...

"Reverse" statistics: generating data based on mean and standard deviation

Having a dataset and calculating statistics from it is easy. How about the other way around? Let's say I know some variable has an average X, standard deviation Y and assume it has normal (Gaussian) distribution. What would be the best way to generate a "random" dataset (of arbitrary size) which will fit the distribution? EDIT: This ki...

language-agnostic

Algorithms and methods for attribute/feature selection?

I have data with continuous class and I'm searching for good methods to reduce number of attributes. Now I'm using correlation based filters, random forests and Gram–Schmidt algorithm. What I want to achieve is answer which attributes are more important/relevant to class attribute than others. By using methods that I mentioned befor...

machine-learning

feature-selection

IPv6 usage statistics

Hi, does anybody know current situation about IPv6 penetration in the public Internet? I would like to know how IPv6 addresses are currently used, because of development of the feature which relies on user host IP addresses. The question is, if it is worth while to consider also IPv6 addresses. I've found some statistics from 2008, b...

choosing between algorithms

Hi All, I am sure there are lot of Software Testing Engineers, Algorithm Validation Engineers on Stackoverflow.Could someone please tell me how would one proceed in the following scenario. Say we have a Mammogram and 5 different algorithms which take this mammogram as input and identify if there is Cancer in the patient. If 3 out of 5 ...

interview-questions

R statistics: problem with simple column vector

Hello, I have a problem using data from a tab delimited data file imported with read.delim. Most of the columns contain numerical data which I need to do a t.test for. Unfortunately I always get this error: Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentiallyconstant") : missi...

calculating sharpe ratio in java

Hi forum, I am trying to calculate sharpe ratio in java, but I am struggling to find a "correct" dataset and result to test Refering to http://www.hedgeco.net/blogs/2008/07/30/explaining-the-sharpe-ratio-again/ Investment Monthly Returns Jan Feb Mar Apr May June Jul Aug Sep Oct Nov Dec 1.64 5.85 9.22 3.51 -...

Heavy computations analysis/optimization

First of all, I don't have multiplication, division operations so i could use shifting/adding, overflow-multiplication, precalculations etc. I'm just comparing one n-bit binary number to another, but according to algorithm the quantity of such operations seems to be huge. Here it is : There is given a sequence of 0's and 1's that is di...

PHP Estimation Function

I am trying to calculate value $x in a number series based on an array of numbers (as $numbers). Ex: $numbers = array(1=>1000,2=>600,3=>500,4=>450,5=>425,6=>405,7=>400,8=>396); function estimateNumber($x) { // function to estimate number $x in $numbers data set } What would be the most statistically accurate method? ...

standard-deviation

Is it about statistics?

Hi, not sure it's the right place to ask. Anyway, I searched online & ended up in confused with this one: Let's take 1 question as an example: The drying rate in an industrial process is dependent on many factors and varies according to the following distribution. Minutes Relative Frequency 3 0.22 4 0.36 5 ...

How to measure HTTP cache hit rates?

Is it possible to detect HTTP cache hits in order to calculate a cache hit rate? I'd like to add a snippet of code (JavaScript) to a HTML page that reports (AJAX) whether a resource was available from a client's local cache or fetched from server. I'd then compile some stats to give some insight on the effects of my cache tuning. I'm p...

How to calculate a multi-variable formula from a table of data

I have a table of several independent variables that I need to calculate a formula from to generate the dependent variable. Though trial I have come up with a value for the dependent variable. For example, I have a table like this: x1 | x2 | x3 || z(value found by experiment) ------------------- 1 | 2 | 3 || 10 3 | 4 | 5 || 14 2 ...

R language: how to split a data frame

I want to split a data frame into several smaller ones. This looks like a very trivial question, however I cannot find a solution from web search. Can anyone help? Also, do you have any recommendation for a simple experiment design or survey R package ? many thanks. Leo ...

How many random strings does this code generate?

I am considering this random string generator in perl: sub generate_random_string { my $length = 12; my @chars = qw/2 3 4 5 6 7 8 9 A B C D E F G H J K M N P Q R S T U V W X Y Z/; my $str = ''; $str .= $chars[int rand @chars] for 1..$length; return $str; } How many unique strings will this generate? If I extend th...

Price Filter Grouping Algorithm

I am creating an ecommerce site, and I am having trouble developing a good algorithm to sort a products that are pulled from the database into halfway appropriate groups. I have tried simply dividing the highest price into 4, and basing each group off that. I also tried standard deviations based around the mean. Both could result with pr...

Covariance matrix computation

Input : random vector X=xi, i=1..n. vector of means for X=meanxi, i=1..n Output : covariance matrix Sigma (n*n). Computation : 1) find all cov(xi,xj)= 1/n * (xi-meanxi) * (xj-meanxj), i,j=1..n 2) Sigma(i,j)=cov(xi,xj), symmetric matrix. Is this algorithm correct and has no side-effects? ...

1
...
32
33
34
35
36
...
43