I have a data that looks like this:
for_y_axis <-c(0.49534,0.80796,0.93970,0.99998)
for_x_axis <-c(1,2,3,4)
count <-c(0,33,0,4)
What I want to do is to plot the graph using for_x_axis and for_y_axis
but will mark the point with "o" if the count value is equal to 0(zero) and
with "x" if the count value is greater than zero.
Is th...
Suppose I have two vectors of same dimensions:
x <-c(0.49534,0.80796,0.93970,0.99998)
count <-c(0,33,0,4)
How can I group the vectors 'x' into two vectors:
Vector grzero that contain value in x with count value greater than 0 and
Vector eqzero with value in x with count value equal to zero.
Yielding
> print(grzero)
> [1] ...
Hi,
I want to be able to discover the 10 most popular routes through our web site that lead a visitor to register with us.
I have already logged all of this info, but don't seem to be able to find the best solution to query it.
The site is quite high traffic, > 3 million page views per month, so the solution needs to scale.
What sugg...
I want to use Random forests for attribute reduction. One problem I have in my data is that I don't have discrete class - only continuous, which indicates how example differs from 'normal'. This class attribute is a kind of distance from zero to infinity.
Is there any way to use Random forest for such data?
...
Hi, I have the following problem.
I need to calculate the Shrout & Fleiss ICC's for the situation in which items are judged by a varying number of judges. For example, the competitive nature of an industry is judged for a set of industries, but with a different number of judges per industry. One industry is only judged by 2 judges, wher...
Having a dataset and calculating statistics from it is easy. How about the other way around?
Let's say I know some variable has an average X, standard deviation Y and assume it has normal (Gaussian) distribution. What would be the best way to generate a "random" dataset (of arbitrary size) which will fit the distribution?
EDIT: This ki...
I have data with continuous class and I'm searching for good methods to reduce number of attributes. Now I'm using correlation based filters, random forests and Gram–Schmidt algorithm.
What I want to achieve is answer which attributes are more important/relevant to class attribute than others.
By using methods that I mentioned befor...
Hi,
does anybody know current situation about IPv6 penetration in the public Internet?
I would like to know how IPv6 addresses are currently used, because of development of the feature which relies on user host IP addresses. The question is, if it is worth while to consider also IPv6 addresses.
I've found some statistics from 2008, b...
Hi All,
I am sure there are lot of Software Testing Engineers, Algorithm Validation Engineers on Stackoverflow.Could someone please tell me how would one proceed in the following scenario.
Say we have a Mammogram and 5 different algorithms which take this mammogram as input and identify if there is Cancer in the patient. If 3 out of 5 ...
Hello,
I have a problem using data from a tab delimited data file imported with read.delim.
Most of the columns contain numerical data which I need to do a t.test for. Unfortunately I always get this error:
Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my)))
stop("data are essentiallyconstant") :
missi...
Hi forum,
I am trying to calculate sharpe ratio in java, but I am struggling to find a "correct" dataset and result to test
Refering to http://www.hedgeco.net/blogs/2008/07/30/explaining-the-sharpe-ratio-again/
Investment Monthly Returns
Jan Feb Mar Apr May June Jul Aug Sep Oct Nov Dec
1.64 5.85 9.22 3.51 -...
First of all, I don't have multiplication, division operations so i could use shifting/adding, overflow-multiplication, precalculations etc. I'm just comparing one n-bit binary number to another, but according to algorithm the quantity of such operations seems to be huge. Here it is :
There is given a sequence of 0's and 1's that is di...
I am trying to calculate value $x in a number series based on an array of numbers (as $numbers).
Ex:
$numbers = array(1=>1000,2=>600,3=>500,4=>450,5=>425,6=>405,7=>400,8=>396);
function estimateNumber($x) {
// function to estimate number $x in $numbers data set
}
What would be the most statistically accurate method?
...
Hi, not sure it's the right place to ask. Anyway, I searched online & ended up in confused with this one:
Let's take 1 question as an example:
The drying rate in an industrial
process is dependent on many factors
and varies according to the following
distribution.
Minutes Relative Frequency
3 0.22
4 0.36
5 ...
Is it possible to detect HTTP cache hits in order to calculate a cache hit rate?
I'd like to add a snippet of code (JavaScript) to a HTML page that reports (AJAX) whether a resource was available from a client's local cache or fetched from server. I'd then compile some stats to give some insight on the effects of my cache tuning. I'm p...
I have a table of several independent variables that I need to calculate a formula from to generate the dependent variable. Though trial I have come up with a value for the dependent variable.
For example, I have a table like this:
x1 | x2 | x3 || z(value found by experiment)
-------------------
1 | 2 | 3 || 10
3 | 4 | 5 || 14
2 ...
I want to split a data frame into several smaller ones. This looks like a very trivial question, however I cannot find a solution from web search.
Can anyone help?
Also, do you have any recommendation for a simple experiment design or survey R package ?
many thanks.
Leo
...
I am considering this random string generator in perl:
sub generate_random_string {
my $length = 12;
my @chars = qw/2 3 4 5 6 7 8 9 A B C D E F G H J K M N P Q R S T U V W X Y Z/;
my $str = '';
$str .= $chars[int rand @chars] for 1..$length;
return $str;
}
How many unique strings will this generate? If I extend th...
I am creating an ecommerce site, and I am having trouble developing a good algorithm to sort a products that are pulled from the database into halfway appropriate groups. I have tried simply dividing the highest price into 4, and basing each group off that. I also tried standard deviations based around the mean. Both could result with pr...
Input : random vector X=xi, i=1..n.
vector of means for X=meanxi, i=1..n
Output : covariance matrix Sigma (n*n).
Computation : 1) find all cov(xi,xj)= 1/n * (xi-meanxi) * (xj-meanxj), i,j=1..n
2) Sigma(i,j)=cov(xi,xj), symmetric matrix.
Is this algorithm correct and has no side-effects?
...