tags:

views:

90

answers:

4

Hi, I have a row in a data frame in R that is made up of sequences of undetermined length of 0s 1s and 2s as characters. So "01", "010", "201", "102", "00012"... things like this.

I'd like to find a way to determine if the last character in the string is NUMERICALLY the largest. It's important that I keep the row in the data frame as characters for other purposes. So basically I want to take substr(x, nchar(x), nchar(x)) and determine if it, as a number, is the largest of the numbers in the character string.

I'm super lost as to how to do this, since I'm not all that familiar with regular expressions and I have to back and forth between treating elements as characters and numbers.

Thanks in advance.

~Maureen

A: 

The regex would be [0-9]$ to get the last number, the rest of the logic depends on the environment you're developing in.

Dan Heberden
A: 

One way would be

p <- as.numeric(strsplit("0120102","")[[1]])
if (max(p) == p[length(p)]) {
   print("yes")
}

Actually you can ignore as.numeric() since "2" > "1" > "0":

p <- strsplit("0120102", "")[[1]]

If you wanted to apply this to your data.frame A:

apply(A, c(1,2), function(z) {p<-strsplit(z, "")[[1]];(max(p) == p[length(p)])})
Apprentice Queue
+6  A: 

Let df be the name of the dataframe and the row with the string sequences "01", "010", "201", "102", "00012" is No.2. You can get a vector that answers the question if the last character in the string is NUMERICALLY the largest giving this:

sapply(strsplit(as.character(df[2,]),""),function(x) x[length(x)] >= max(x))
[1]  TRUE FALSE FALSE  TRUE TRUE
gd047
You're my hero. This works perfectly. Thanks! :)
Maureen
A: 

I think you're best bet will be to look at how regex works in the R language:

http://www.regular-expressions.info/rlanguage.html

Like Dan Heberden said in the above post, you'll need to tokenize the string you gave as an example in your post, and then grep( ...? ) the tokens for the regex "[0-9]$". By the way, with regex, you can treat everything as characters, so you shouldn't have to shuttle back and forth between numeric and character mode, except for when you take the results of the grep function and parse it to numeric form for your comparison.

warriorpostman