tags:

views:

68

answers:

1

I have the following XML File:

<Company >
    <shareprice>
        <timeStamp> 12:00:00.01</timeStamp>
        <Price>  25.02</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:00.02</timeStamp>
        <Price>  15</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.025</timeStamp>
        <Price>  15.02</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.031</timeStamp>
        <Price>  18.25</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.039</timeStamp>
        <Price>  18.54</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.050</timeStamp>
        <Price> 16.52</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:02.01</timeStamp>
        <Price>  17.50</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:03.01</timeStamp>
        <Price>  25.02</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:05.02</timeStamp>
        <Price>  30</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:11.025</timeStamp>
        <Price>  32.25</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:12.031</timeStamp>
        <Price>  26.05</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:15.039</timeStamp>
        <Price>  18.54</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:19.050</timeStamp>
        <Price> 16.52</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:01:02.01</timeStamp>
        <Price>  17.50</Price>
    </shareprice>
</Company>

And I also have the following R Code:

library (ggplot2)
library (XML)
df <- xmlToDataFrame(file.choose()) 
df$timeStamp <- strptime(as.character(df$timeStamp), "%H:%M:%OS")
df$Price <- as.numeric(as.character(df$Price))
sapply(df, class)          
options("digits.secs"=3)   
summary (df)              
df$timeStamp <- df[1,"timeStamp"] + cumsum(runif(1:length(df$timeStamp))*60)
summary(df)
diff1 = 0
diff <- append(diff1,diff(df$Price))
summary (df$Price)
Ymin <- min(df$Price)
Ymax <- max(df$Price)
Ymedian <- median (df$Price)
Ymean <- mean(df$Price)
Ysd <- sd (df$Price)
sink (file="c:/xampp/htdocs/Sharedata.xml", type="output",split=FALSE)
cat("<graph caption=\"Share Data Wave\" subcaption=\"For Person's Name\"   xAxisName=\"Time\" yAxisMinValue=\"-0.025\" yAxisName=\"Voltage\" decimalPrecision=\"5\"  formatNumberScale=\"0\" numberPrefix=\"\" showNames=\"1\" showValues=\"0\" showAlternateHGridColor=\"1\" AlternateHGridColor=\"ff5904\" divLineColor=\"ff5904\" divLineAlpha=\"20\" alternateHGridAlpha=\"5\">\n")
cat(sprintf("    <set name=\"%s\" value=\"%f\" hoverText = \"The difference from last value: %s\" ></set>\n", df$timeStamp, df$Price, diff))
cat ("</graph>\n")
unlink("data.xml")
sink (file="c:/xampp/htdocs/Sharesstatistics.xml", type="output",split=FALSE)
cat ("  <statistics>\n")
cat (sprintf("    <mean>%s</mean>\n", Ymean))
cat (sprintf("    <sd>%s</sd>\n",Ysd))
cat (sprintf("    <min>%s</min>\n", Ymin))
cat (sprintf("    <median>%s</median>\n",Ymedian))
cat (sprintf("    <max>%s</max>\n", Ymax))
cat ("  </statistics>\n")
unlink("statistics.xml")
quit()

The R code does all I want and need it do on the full file. My question relates to how to let the user to select a range of the input file to analyse instead of the full file, how would this be done? For example if the user just wants the 2nd to 5th enteries of the input xml file and keep the same output as defined by the cat statements.

<shareprice>
        <timeStamp> 12:00:00.02</timeStamp>
        <Price>  15</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.025</timeStamp>
        <Price>  15.02</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.031</timeStamp>
        <Price>  18.25</Price>
    </shareprice>

All help greatly appreciated.

Regards,

Anthony.

A: 

This question can be easily solved by just reading the data frame and then asking the user to give lower en upper limit of records using eg scan(n=2). See also ?scan. It allows you to give input interactively, so the user can choose what to do. This is a case for entering a range of data to be used.

x <- scan(n=2)
id <- min(x):max(x)

df2 <- df[id,]

If you want to read in only the required fields from a very big XML table, that's another story. I couldn't think of a built-in function to do that, so you would have to do something along the lines of :

# function reads a subset of an xml file,
# assuming a white line is dividing the individual records.
# n is a vector containing the record numbers wanted

subset.xml <- function(x,n,...){
    # set a range if n is just a number
    if (length(n)==1) n <- 1:n

    #initiate vars
    skp <- 0 # the number of lines to skip by scan
    count <- 1
    out <- character(1)

  repeat{
      tmp <- scan(x,what=character(0),n=1,skip=skp,blank.lines.skip=F,sep="\n")
      skp <- skp+1
      if(length(tmp)==0) {break} # no more input

      if((count %in% n) & (tmp !="")) out <- paste(out,tmp,sep="\n")
      if(tmp=="") count <- count+1 # white line seperates records
  }
  out <- substring(out,3)
  out <- paste("<Data>",out,"</Data>",sep="\n")
  return(xmlToDataFrame(xmlParse(out)))
}

df <- subset.xml("test.xml",2:4)
> df
      timeStamp   Price
1   12:00:00.02      15
2  12:00:01.025   15.02
3  12:00:01.031   18.25
Joris Meys