views:

68

answers:

1

This is a follow up question as hadley pointed out unless I fix the problem with the time stamps the graphs I produce would be incorrect. With this in mind I am working towards fixing the issues I am having with the code. So far I have from my earlier questions that have been answered stopped using the attach() function in favour of using dataSet.df$variableName I am having problems drawing the graph from the strptime time stamps. I will attach all the code I am using and the XML file from which the data set is parsed (This was also answered in an earlier question) from.

<?xml version = "1.0"?>
    <Company >
 <shareprice>
     <timeStamp> 12:00:00.01</timeStamp>
     <Price>  25.02</Price>
 </shareprice>
 <shareprice>
     <timeStamp> 12:00:00.02</timeStamp>
     <Price>  15</Price>
 </shareprice>
 <shareprice>
      <timeStamp> 12:00:00.025</timeStamp>
      <Price>  15.02</Price>
 </shareprice>
 <shareprice>
      <timeStamp> 12:00:00.031</timeStamp>
      <Price>  18.25</Price>
 </shareprice>
 <shareprice>
      <timeStamp> 12:00:00.039</timeStamp>
      <Price>  18.54</Price>
 </shareprice>
 <shareprice>
       <timeStamp> 12:00:00.050</timeStamp>
       <Price> 16.52</Price>
 </shareprice>
    <shareprice>
      <timeStamp> 12:00:01.01</timeStamp>
      <Price>  17.50</Price>
    </shareprice>
  </Company>

The R code I have currently is as follows:

library(ggplot2)
library (XML)
test.df <- xmlToDataFrame("c:/Users/user/Desktop/shares.xml")
test.df 
timeStampParsed <- strptime(as.character(test.df$timeStamp), "%H:%M:%OS")
test.df$Price <- as.numeric(as.character(test.df$Price))
summary (test.df)
mean(test.df$Price)
sd (test.df$Price)
mean(timeStampParsed)
par(mfrow=c(1,2))
plot(timeStampParsed, test.df$Price)
qplot(timeStampParsed,Price,data=test.df,geom=c("point","line"), 
      scale_y_continuous(limits = c(10,26)))

The plot command produces a graph but it is not very pleasant looking. the qplot command returns the following error message:

Error in sprintf(gettext(fmt, domain = domain), ...) : 
invalid type of argument[1]: 'symbol'

In the interest in getting this right (and cutting down on the questions being asked) is there a tutorial / website that I can use? Once again thanks very much for your help.

+2  A: 

You still make some of the mistakes in the code I corrected in my two previous answers to you. So let's try this again, more explicitly:

library(ggplot2)
library (XML)
df <- xmlToDataFrame("/tmp/anthony.xml")   # assign to df, shorter to type
df
sapply(df, class)          # shows everything is a factor
summary(df)                # summary for factor: counts !
df$timeStamp <- strptime(as.character(test.df$timeStamp), "%H:%M:%OS")
df$Price <- as.numeric(as.character(test.df$Price))
sapply(df, class)          # shows both columns converted
options("digits.secs"=3)   # make sure we show sub-seconds
summary (df)               # real summary
with(df, plot(timeStamp, Price))    # with is an elegant alternative to attach()

I also get an error with qplot() but you may simply have too little of a range in your data. So let's try this:

R> set.seed(42)               # fix random number generator
R> df$timeStamp <- df[1,"timeStamp"] + cumsum(runif(7)*60)
R> summary(df)                # new timestamps spanning larger range
   timeStamp                          Price     
 Min.   :2010-07-14 12:00:54.90   Min.   :15.0  
 1st Qu.:2010-07-14 12:01:59.71   1st Qu.:15.8  
 Median :2010-07-14 12:02:58.12   Median :17.5  
 Mean   :2010-07-14 12:02:55.54   Mean   :18.0  
 3rd Qu.:2010-07-14 12:03:52.20   3rd Qu.:18.4  
 Max.   :2010-07-14 12:04:51.96   Max.   :25.0  
R> qplot(timeStamp,Price, data=df, geom=c("point","line"), 
+  scale_y_continuous(limits = c(10,26)))
R> 

Now qplot() works.

So in sum, you were using data that was not fulfilling some minimum requirements of the qplot function your were using -- having a time axis spanning more than a second, say.

In general, you may want to start with An Introduction to R (came with the program) or another intro text. You jumped head-first to advanced material (datetime data types, reading from XML, factors, ...) and got burned. First steps first.

Dirk Eddelbuettel
Dirk thanks for the help. I changed the XML file to have more than one second, this seems to meet the minimum requirements of the qplot function.
Anthony Keane
I also see the advantage of correcting the time stamps using strptime() as hadley was right that doing the graphs without fixing the time stamp created wrong graphs as the x axis is not scaled to represent the time line.@Dirk I will look at An Introduction to R that came with the program, its just that I had no option but to start where I did. You also had answered one of my next questions a while back on automating R scripts. I used the BATCH method as I did not find the script method in that answer. Is there a way of controlling the output file type (.pdf file of graph created)?
Anthony Keane