tags:

views:

51

answers:

2

I tried to order csv file but the rank() function acting weird on number with -E notation.

> comparison = read.csv("e:/thesis/comparison/output.csv", header=TRUE)  
> comparison$proxygeneld_full.txt[0:20]
[1] 9.34E-07    4.04E-06    4.16E-06    7.17E-06    2.08E-05    3.00E-05   
[7] 3.59E-05    4.16E-05    7.75E-05    9.50E-05    0.0001116   0.00012452 
[13] 0.00015494  0.00017892  0.00017892  0.00018345  0.0002232   0.000231775
[19] 0.00023241  0.0002666  
13329 Levels: 0.0001116 0.00012452 0.00015494 0.00017892 0.00018345 ... adjP
> rank(comparison$proxygeneld_full.txt[0:20])
[1] 19.0 14.0 16.0 17.0 11.0 12.0 13.0 15.0 18.0 20.0  1.0  2.0  3.0  4.5  4.5
[16]  6.0  7.0  8.0  9.0 10.0 
#It should be 1-20 in order ....

It seems just ignore -E notation right there. It turn out to be fine if I'm not using data from file

> rank(c(9.34E-07, 4.04E-06, 7.17E-06))
[1] 1 2 3

Am I missing something ? Thanks.

+1  A: 

Yep - $proxygeneld_full.txt[0:20] isn't even numeric. It is a factor:

13329 Levels: 0.0001116 0.00012452 0.00015494 0.00017892 0.00018345 ... adjP

So rank() is ranking the numeric codes that lay behind the factor representation, and the E-0X "numbers" sort after the non-E numbers in the levels.

Look at str(comparison) and you'll see that proxygeneld_full.txt is a factor.

I'm struggling to replicate the behaviour you are seeing with E numbers in a csv file. R reads them properly as numeric. Check your CSV to make sure you don't have some none numeric values in that column, or that the E numbers are not quoted.

Ahh! looking again at the levels you quote: there is an adjP lurking at the end of the code you show. Check your data again as this adjP is in there someone where and that is forcing R to code that variable as a factor hence the behaviour you see with ranking as I described above.

Gavin Simpson
+1  A: 

I guess you have some non-numeric data in your csv file. What happens if you do?

as.numeric(comparison$proxygeneld_full.txt)

If this produces different numbers than you expected, you certainly have some text in this column.

Henrik
That will give all numbers, but that is because the variable in question is a factor and `as.numeric()` on it will return the internal codes (1,2,3,...,n) not the data coerced to numeric.
Gavin Simpson
I just found the problem myself. Just as you said, I have some text in column. I start to hate myself for overlook this thing.
Tg