tags:

views:

57

answers:

2

I got help parsing the following XML file on this site:

<?xml version = "1.0"?>
    <Company >
   <shareprice>
    <timeStamp> 12:00:00.01</timeStamp>
    <Price>  25.02</Price>
   </shareprice>

   <shareprice>
    <timeStamp> 12:00:00.02</timeStamp>
    <Price>  15</Price>
   </shareprice>



   <shareprice>
    <timeStamp> 12:00:00.025</timeStamp>
    <Price>  15.02</Price>
    </shareprice>



    <shareprice>
    <timeStamp> 12:00:00.031</timeStamp>
    <Price>  18.25</Price>
</shareprice>



  <shareprice>
    <timeStamp> 12:00:00.039</timeStamp>
    <Price>  18.54</Price>
  </shareprice>



   <shareprice>
    <timeStamp> 12:00:00.050</timeStamp>
    <Price> 16.52</Price>
  </shareprice>


    <shareprice>
    <timeStamp> 12:00:01.01</timeStamp>
    <Price>  17.50</Price>
    </shareprice>
</Company>

I am using the following code in R to try and plot the data to get the share price on the Y axis and the timestamp on the x axis:

library (XML)
test.df <- xmlToDataFrame("c:/Users/user/Desktop/shares.xml")
test.df
attach(test.df)
mean(as.numeric(Price))
sd (as.numeric(Price)) 
plot(timeStamp,as.numeric(Price))

However the resulting plot is not what I expect. It returns the Time stamps on the x axis but the y axis is numbered from 1 - 7. Is there something I should be doing to alter the data set either in R or the XML file itself?

A: 

try ggplot2 package from hadley – it´s hilarious.

melt your date with id date and then plot with qplot:

test = melt(test.df,id="timestamps")
qplot(timestamp,yvalue,data=test.df,geom="line")

etc.

HTH

EDIT:

To influence the scale use:

qplot(...) +  scale_y_continuous(limits = c(-0.25,0.25))

Also make sure to check ggplot2 documentation (you´ll find it in the link above) – it simply two-steps every other R documentation on plotting out there.

ran2
This suggestion got a nicer graph than the one the original code I was using was producing. However, the price on the y axis are still not right. The Price points go from 15 on the bottom to 16.52 at the top with 25.02, 18.54 etc... in between. This throws out a nonsensical (but I must admit visually pleasing) graph.
Anthony Keane
thats an easy one :) , check the edit of my answer.
ran2
Made the change but the graph is still the same. Here is the line of code I have: qplot(timeStamp,Price,data=test.df,geom="point",scale_y_continuous(limits = c(-0.25,0.25)))
Anthony Keane
Dirk is right, you need to format your timestamp. try as.Date() respectively its documentation. I am not sure how to handle this particular timestamp, because the frequency appears to be somewhat higher than what I am used to. In order to suggest a function, it would be helpful to know what the scale actually is. My limits for the y scale where of course just exemplary.
ran2
+1  A: 

You need to actually turn the x-axis data into time objects. Combine your

library (XML)
test.df <- xmlToDataFrame("c:/Users/user/Desktop/shares.xml")
test.df
attach(test.df)
mean(as.numeric(Price))
sd (as.numeric(Price)) 

with what I showed you last week in this SO question (and you need as.character() as your data probably came in as factors)

timeStampParsed <- strptime(as.character(timeStamp), "%H:%M:%OS")

before you can plot via

plot(timeStampParsed, as.numeric(Price))

Likewise for ggplot2: You first need to get your data into a date type.

Lastly, if you want an actual day in there that is different from the imputed default of today, you need to prepend it to the timeStamp text as for example in

timeStampParsed <- strptime(paste("2010-07-01"), as.character(timeStamp), 
                            "%Y-%m%-%d %H:%M:%OS")
Dirk Eddelbuettel
Not interested in adding the actual day to the graph. The x-axis is coming out correctly with the timestamps but I am having trouble with the Y axis. The y axis range should go from 0 -> 26 (This should cover all the points I have), but it starts at 15 on the bottom to 16.52 at the top with 25.02, 18.54 etc... in between these values. Since 25.02 is greater than 16.52 I am stuck as to why it is 25.02 is lower on the down on the y axis than 16.52.
Anthony Keane
First things first: make sure `summary(test.df)` shows the right data. Also, `attach` is no longer recommended. Do the data transformation directly in the `data.frame`, e.g. `test.df$Price` <- as.numeric(as.character(test.df$Price))`. It may be as simple as the common 'string as factors' problem which you can circumvent with a global option as well as local ones (though I am unsure about details with the `XML` package I rarely use myself).
Dirk Eddelbuettel
Okay that has sorted the issue. I have always used attach() for accessing variables as instructed, is there a reason for it no longer being recommended. Thanks for the help to date.The code I have now is: library(ggplot2)library (XML)test.df <- xmlToDataFrame("c:/Users/user/Desktop/shares.xml")test.df test.df$Price <- as.numeric(as.character(test.df$Price))test.df$timeStamp <- as.character(as.character(test.df$timeStamp), "%H:%M:%OS")summary (test.df)mean(test.df$Price)sd (test.df$Price)qplot(timeStamp,Price,data=test.df,geom="point", scale_y_continuous(limits = c(10,26)))
Anthony Keane