imageData-package {imageData}R Documentation

Aids in Processing and Plotting Data from a Lemna-Tec Scananalyzer

Description

Note that 'imageData' has been superseded by 'growthPheno'. The package 'growthPheno' incorporates all the functionality of 'imageData' and has functionality not available in 'imageData', but some 'imageData' functions have been renamed. The 'imageData' package is no longer maintained, but is retained for legacy purposes.

Version: 0.1-62

Date: 2023-08-21

Index

For an overview of the use of these functions and an example see below.

(i) Data
RiceRaw.dat Data for an experiment to investigate a rice
germplasm panel.
(ii) Data frame manipulation
designFactors Adds the factors and covariates for a blocked,
split-plot design.
getDates Forms a subset of 'responses' in 'data' that
contains their values for the nominated times.
importExcel Imports an Excel imaging file and allows some
renaming of variables.
longitudinalPrime Selects a set variables to be retained in a
data frame of longitudinal data.
twoLevelOpcreate Creates a data.frame formed by applying, for
each response, abinary operation to the values of
two different treatments.
(iii) Plots
anomPlot Identifies anomalous individuals and produces
longitudinal plots without them and with just them.
corrPlot Calculates and plots correlation matrices for a
set of responses.
imagetimesPlot Plots the time within an interval versus the interval.
For example, the hour of the day carts are imaged
against the days after planting (or some other
number of days after an event).
longiPlot Plots longitudinal data from a Lemna Tec
Scananalyzer.
probeDF Compares, for a set of specified values of df,
a response and the smooths of it, possibly along
with growth rates calculated from the smooths.
(iv) Calculations value-by-value
GrowthRates Calculates growth rates (AGR, PGR, RGRdiff)
between pairs of values in a vector.
WUI Calculates the Water Use Index (WUI).
anom Tests if any values in a vector are anomalous
in being outside specified limits.
calcTimes Calculates for a set of times, the time intervals
after an origin time and the position of each with
in that time.
calcLagged Replaces the values in a vector with the result
of applying an operation to it and a lagged value.
cumulate Calculates the cumulative sum, ignoring the
first element if exclude.1st is TRUE.
(v) Calculations over multiple values
fitSpline Produce the fits from a natural cubic smoothing
spline applied to a response in a 'data.frame'.
intervalGRaverage Calculates the growth rates for a specified
time interval by taking weighted averages of
growth rates for times within the interval.
intervalGRdiff Calculates the growth rates for a specified
time interval.
intervalValueCalculate Calculates a single value that is a function of
an individual's values for a response over a
specified time interval.
intervalWUI Calculates water use indices (WUI) over a
specified time interval to a data.frame.
(vi) Caclulations in each split of a 'data.frame'
splitContGRdiff Adds the growth rates calculated continuously
over time for subsets of a response to a
'data.frame'.
splitSplines Adds the fits after fitting a natural cubic
smoothing spline to subsets of a response to a
'data.frame'.
splitValueCalculate Calculates a single value that is a function of
an individual's values for a response.
(vii) Principal variates analysis (PV A)
intervalPVA Selects a subset of variables observed within a
specified time interval using PVA.
PVA Selects a subset of variables using PVA.
rcontrib Computes a measure of how correlated each
variable in a set is with the other variable,
conditional on a nominated subset of them.

Overview

This package can be used to carry out a full seven-step process to produce phenotypic traits from measurements made in a high-throughput phenotyping facility, such as one based on a Lemna-Tec Scananalyzer 3D system and described by Al-Tamimi et al. (2016). Otherwise, individual functions can be used to carry out parts of the process.

The basic data consists of imaging data obtained from a set of pots or carts over time. The carts are arranged in a grid of Lanes \times Positions. There should be a unique identifier for each cart, which by default is Snapshot.ID.Tag, and variable giving the Days after Planting for each measurement, by default Time.after.Planting..d.. In some cases, it is expected that there will be a column labelled Snapshot.Time.Stamp, which reflects the time of the imaging from which a particular data value was obtained.

The full seven-step process is as follows:

  1. Use importExcel to import the raw data from the Excel file. This step should also involve any editing of the data needed to take account of mishaps during the data collection and the need to remove faulty data (produces raw.dat). Generally, data can be removed by replacing only values for responses with missing values (NA) for carts whose data is to be removed, leaving the identifying information intact.

  2. Use longitudinalPrime to select a subset of the imaging variables produced by the Lemna Tec Scanalyzer and, if the design is a blocked, split-plot design, use designFactors to add covariates and factors that might be used in the analysis (produces the data frame longi.prime.dat).

  3. Add derived traits that result in a value for each observation: use splitContGRdiff to obtain continuous growth rates i.e. a growth rate for each time of observation, except the first; WUI to produce continuous Water Use Efficiency Indices (WUE) and cumulate to produce cumulative responses. (Produces the data frame longi.dat.)

  4. Use splitSplines to fit splines to smooth the longitudinal trends in the primary traits and calculate continuous growth rates from the smoothed response (added to the data frame longi.dat). There are two options for calculating continuous smoothed growth rates: (i) by differencing — use splitContGRdiff; (ii) from the first derivatives of the splines — in splitSplines include 1 in the deriv argument, include "AGR" in suffices.deriv and set the RGR to say "RGR". Optionally, use probeDF to compare the smooths for a number of values of df and, if necessary, re-run splitSplines with a revised value of df.

  5. Perform an exploratory examination of the unsmoothed data by using longiPlot to produce longitudinal plots of unsmoothed imaging traits and continuous growth rates. Also, use longiPlot to plot the smoothed imaging traits and continuous growth rates and anomPlot to check for anomalies in the data.

  6. Produce cart data: traits for which there is a single value for each Snapshot.ID.Tag or cart. (produces the data frame cart.dat)

    1. Set up a cart data.frame with the factors and covariates for a single observation from all carts. This can be done by subsetting longi.dat so that there is one entry for each cart.

    2. Use getDates to add traits at specific times to the cart data.frame, often the first and last day of imaging for each Snapshot.ID.Tag. The times need to be selected so that there is one and only one observation for each cart. Also form traits, such as growth rates over the whole imaging period, based on these values

    3. Based on the longitudinal plots, decide on the intervals for which growth rates and WUEs are to be calculated. The growth rates for intervals are calculated from the continuous growth rates, using intervalGRdiff, if the continuous growth rates were calculated by differencing, or intervalGRaverage, if they were calculated from first derivatives. To calculate WUEs for intervals, use intervalWUI, The interval growth rates and WUEs are added to the cart data.frame.

  7. (Optional) There is also the possibility that, for experiments investigating salinity, the Shoot Ion Independent Tolerance (SIIT) index can be calculated using twoLevelOpcreate.

Author(s)

NA

Maintainer: NA

References

Al-Tamimi, N, Brien, C.J., Oakey, H., Berger, B., Saade, S., Ho, Y. S., Schmockel, S. M., Tester, M. and Negrao, S. (2016) New salinity tolerance loci revealed in rice using high-throughput non-invasive phenotyping. Nature Communications, 7, 13342.

See Also

dae

Examples

## Not run: 
### This example can be run because the data.frame RiceRaw.dat is available with the package
#'# Step 1: Import the raw data
data(RiceRaw.dat)

#'# Step 2: Select imaging variables and add covariates and factors (produces longi.dat)
longi.dat <- longitudinalPrime(data=RiceRaw.dat, smarthouse.lev=c("NE","NW"))

longi.dat <- designFactors(longi.dat, insertName = "xDays",
                           designfactorMethod="StandardOrder")

#'## Particular edits to longi.dat
longi.dat <- within(longi.dat, 
                    { 
                      Days.after.Salting <- as.numfac(Days) - 29
                    })
longi.dat <- with(longi.dat, longi.dat[order(Snapshot.ID.Tag,Days), ])

#'# Step 3: Form derived traits that result in a value for each observation
#'### Set responses
responses.image <- c("Area")
responses.smooth <- paste(responses.image, "smooth", sep=".")

#'## Form growth rates for each observation of a subset of responses by differencing
longi.dat <- splitContGRdiff(longi.dat, responses.image, 
                             INDICES="Snapshot.ID.Tag",
                             which.rates = c("AGR","RGR"))

#'## Form Area.WUE 
longi.dat <- within(longi.dat, 
                    { 
                      Area.WUE <- WUI(Area.AGR*Days.diffs, Water.Loss)
                    })

#'## Add cumulative responses 
longi.dat <- within(longi.dat, 
                    { 
                      Water.Loss.Cum <- unlist(by(Water.Loss, Snapshot.ID.Tag, 
                                                  cumulate, exclude.1st=TRUE))
                      WUE.cum <- Area / Water.Loss.Cum 
                    })

#'# Step 4: Fit splines to smooth the longitudinal trends in the primary traits and
#'# calculate their growth rates
#'
#'## Smooth responses
#+
for (response in c(responses.image, "Water.Loss"))
  longi.dat <- splitSplines(longi.dat, response, x="xDays", INDICES = "Snapshot.ID.Tag", 
                            df = 4, na.rm=TRUE)
longi.dat <- with(longi.dat, longi.dat[order(Snapshot.ID.Tag, xDays), ])

#'## Loop over smoothed responses, forming growth rates by differences
#+
responses.GR <- paste(responses.smooth, "AGR", sep=".")
longi.dat <- splitContGRdiff(longi.dat, responses.smooth, 
                             INDICES="Snapshot.ID.Tag",
                             which.rates = c("AGR","RGR"))

#'## Finalize longi.dat
longi.dat <- with(longi.dat, longi.dat[order(Snapshot.ID.Tag, xDays), ])

#'# Step 5: Do exploratory plots on unsmoothed and smoothed longitudinal data
responses.longi <- c("Area","Area.AGR","Area.RGR", "Area.WUE")
responses.smooth.plot <- c("Area.smooth","Area.smooth.AGR","Area.smooth.RGR")
titles <- c("Total area (1000 pixels)", 
            "Total area AGR (1000 pixels per day)", "Total area RGR (per day)",
            "Total area WUE (1000 pixels per mL)")
titles.smooth<-titles
nresp <- length(responses.longi)
limits <- list(c(0,1000), c(-50,125), c(-0.05,0.40), c(0,30))

#' ### Plot unsmoothed profiles for all longitudinal  responses 
#+ "01-ProfilesAll"
klimit <- 0
for (k in 1:nresp)
{ 
  klimit <- klimit + 1
  longiPlot(data = longi.dat, response = responses.longi[k], 
            y.title = titles[k], x="xDays+35.42857143", 
            ggplotFuncs = list(geom_vline(xintercept=29, linetype="longdash", size=1), 
                               scale_x_continuous(breaks=seq(28, 42, by=2)),
                               scale_y_continuous(limits=limits[[klimit]])))
}


#' ### Plot smoothed profiles for all longitudinal responses - GRs by difference
#+ "01-SmoothedProfilesAll"
nresp.smooth <- length(responses.smooth.plot)
limits <- list(c(0,1000), c(0,100), c(0.0,0.40))
for (k in 1:nresp.smooth)
{ 
  longiPlot(data = longi.dat, response = responses.smooth.plot[k], 
            y.title = titles.smooth[k], x="xDays+35.42857143", 
            ggplotFuncs = list(geom_vline(xintercept=29, linetype="longdash", size=1), 
                               scale_x_continuous(breaks=seq(28, 42, by=2)),
                               scale_y_continuous(limits=limits[[klimit]])))
  print(plt)
}


#'### AGR anomalies - plot without anomalous plants followed by plot of anomalous plants
#+ "01-0254-AGRanomalies"
anom.ID <- vector(mode = "character", length = 0L)
response <- "Area.smooth.AGR"
cols.output <- c("Snapshot.ID.Tag", "Smarthouse", "Lane", "Position", 
                 "Treatment.1", "Genotype.ID", "Days")
anomalous <- anomPlot(longi.dat, response=response, lower=2.5, start.time=40, 
                      x = "xDays+35.42857143", vertical.line=29, breaks=seq(28, 42, by=2), 
                      whichPrint=c("innerPlot"), y.title=response)
subs <- subset(anomalous$data, Area.smooth.AGR.anom & Days==42)
if (nrow(subs) == 0)
{ cat("\n#### No anomalous data here\n\n")
} else
{ 
  subs <- subs[order(subs["Smarthouse"],subs["Treatment.1"], subs[response]),]
  print(subs[c(cols.output, response)])
  anom.ID <- unique(c(anom.ID, subs$Snapshot.ID.Tag))
  outerPlot <- anomalous$outerPlot  + geom_text(data=subs,
                                                aes_string(x = "xDays+35.42857143", 
                                                           y = response, 
                                                           label="Snapshot.ID.Tag"), 
                                                size=3, hjust=0.7, vjust=0.5)
  print(outerPlot)
}


#'# Step 6: Form single-value plant responses in Snapshot.ID.Tag order.
#'
#'## 6a) Set up a data frame with factors only
#+
cart.dat <- longi.dat[longi.dat$Days == 31, 
                      c("Smarthouse","Lane","Position","Snapshot.ID.Tag",
                        "xPosn","xMainPosn",
                        "Zones","xZones","SHZones","ZLane","ZMainplots", "Subplots",
                        "Genotype.ID","Treatment.1")]
cart.dat <- cart.dat[do.call(order, cart.dat), ]

#'## 6b) Get responses based on first and last date.
#'
#'### Observation for first and last date
cart.dat <- cbind(cart.dat, getDates(responses.image, data = longi.dat, 
                                     which.times = c(31), suffix = "first"))
cart.dat <- cbind(cart.dat, getDates(responses.image, data = longi.dat, 
                                     which.times = c(42), suffix = "last"))
cart.dat <- cbind(cart.dat, getDates(c("WUE.cum"), 
                                     data = longi.dat, 
                                     which.times = c(42), suffix = "last"))
responses.smooth <- paste(responses.image, "smooth", sep=".")
cart.dat <- cbind(cart.dat, getDates(responses.smooth, data = longi.dat, 
                                     which.times = c(31), suffix = "first"))
cart.dat <- cbind(cart.dat, getDates(responses.smooth, data = longi.dat, 
                                     which.times = c(42), suffix = "last"))

#'### Growth rates over whole period.
#+
tottime <- 42 - 31
cart.dat <- within(cart.dat, 
                   { 
                     Area.AGR <- (Area.last - Area.first)/tottime
                     Area.RGR <- log(Area.last / Area.first)/tottime
                   })

#'### Calculate water index over whole period
cart.dat <- merge(cart.dat, 
                  intervalWUI("Area", water.use = "Water.Loss", 
                              start.times = c(31), 
                              end.times = c(42), 
                              suffix = NULL, 
                              data = longi.dat, include.total.water = TRUE),
                  by = c("Snapshot.ID.Tag"))
names(cart.dat)[match(c("Area.WUI","Water.Loss.Total"),names(cart.dat))] <- 
        c("Area.Overall.WUE", "Water.Loss.Overall")
cart.dat$Water.Loss.rate.Overall <- cart.dat$Water.Loss.Overall / (42 - 31)

#'## 6c) Add growth rates and water indices for intervals
#'### Set up intervals
#+
start.days <- list(31,35,31,38)
end.days <- list(35,38,38,42)
suffices <- list("31to35","35to38","31to38","38to42")

#'### Rates for specific intervals from the smoothed data by differencing
#+
for (r in responses.smooth)
{ for (k in 1:length(suffices))
  { 
    cart.dat <- merge(cart.dat, 
                      intervalGRdiff(r, 
                                     which.rates = c("AGR","RGR"), 
                                     start.times = start.days[k][[1]], 
                                     end.times = end.days[k][[1]], 
                                     suffix.interval = suffices[k][[1]], 
                                     data = longi.dat),
                      by = "Snapshot.ID.Tag")
  }
}

#'### Water indices for specific intervals from the unsmoothed and smoothed data
#+
for (k in 1:length(suffices))
{ 
  cart.dat <- merge(cart.dat, 
                    intervalWUI("Area", water.use = "Water.Loss", 
                                start.times = start.days[k][[1]], 
                                end.times = end.days[k][[1]], 
                                suffix = suffices[k][[1]], 
                                data = longi.dat, include.total.water = TRUE),
                    by = "Snapshot.ID.Tag")
  names(cart.dat)[match(paste("Area.WUI", suffices[k][[1]], sep="."), 
                        names(cart.dat))] <- paste("Area.WUE", suffices[k][[1]], sep=".")
  cart.dat[paste("Water.Loss.rate", suffices[k][[1]], sep=".")] <- 
           cart.dat[[paste("Water.Loss.Total", suffices[k][[1]], sep=".")]] / 
                                           ( end.days[k][[1]] - start.days[k][[1]])
}

cart.dat <- with(cart.dat, cart.dat[order(Snapshot.ID.Tag), ])

#'# Step 7: Form continuous and interval SIITs
#'
#'## 7a) Calculate continuous
#+
cols.retained <-  c("Snapshot.ID.Tag","Smarthouse","Lane","Position",
                    "Days","Snapshot.Time.Stamp", "Hour", "xDays",
                    "Zones","xZones","SHZones","ZLane","ZMainplots",
                    "xMainPosn", "Genotype.ID")
responses.GR <- c("Area.smooth.AGR","Area.smooth.AGR","Area.smooth.RGR")
suffices.results <- c("diff", "SIIT", "SIIT")
responses.SIIT <- unlist(Map(paste, responses.GR, suffices.results,sep="."))

longi.SIIT.dat <- 
  twoLevelOpcreate(responses.GR, longi.dat, suffices.treatment=c("C","S"),
                   operations = c("-", "/", "/"), suffices.results = suffices.results, 
                   columns.retained = cols.retained, 
                   by = c("Smarthouse","Zones","ZMainplots","Days"))
longi.SIIT.dat <- with(longi.SIIT.dat, 
                            longi.SIIT.dat[order(Smarthouse,Zones,ZMainplots,Days),])

#' ### Plot SIIT profiles 
#' 
#+ "03-SIITProfiles"
k <- 2
nresp <- length(responses.SIIT)
limits <- with(longi.SIIT.dat, list(c(min(Area.smooth.AGR.diff, na.rm=TRUE),
                                      max(Area.smooth.AGR.diff, na.rm=TRUE)),
                                    c(0,3),
                                    c(0,1.5)))
#Plots
for (k in 1:nresp)
{ 
  longiPlot(data = longi.SIIT.dat, x="xDays+35.42857143", 
            response = responses.SIIT[k], 
            y.title=responses.SIIT[k], 
            facet.x="Smarthouse", facet.y=".", 
            ggplotFuncs = list(geom_vline(xintercept=29, linetype="longdash", size=1), 
                               scale_x_continuous(breaks=seq(28, 42, by=2)),
                               scale_y_continuous(limits=limits[[klimit]])))
}

#'## 7b) Calculate interval SIITs and check for large values for SIIT for Days 31to35
#+ "01-SIITIntClean"
suffices <- list("31to35","35to38","31to38","38to42")
response <- "Area.smooth.RGR.31to35"
SIIT <- paste(response, "SIIT", sep=".")
responses.SIITinterval <- as.vector(outer("Area.smooth.RGR", suffices, paste, sep="."))

cart.SIIT.dat <- twoLevelOpcreate(responses.SIITinterval, cart.dat, 
                                  suffices.treatment=c("C","S"), 
                                  suffices.results="SIIT", 
                                  columns.suffixed="Snapshot.ID.Tag")
tmp<-na.omit(cart.SIIT.dat)
print(summary(tmp[SIIT]))
big.SIIT <- with(tmp, tmp[tmp[SIIT] > 1.15, c("Snapshot.ID.Tag.C","Genotype.ID",
                                              paste(response,"C",sep="."), 
                                              paste(response,"S",sep="."), SIIT)])
big.SIIT <- big.SIIT[order(big.SIIT[SIIT]),]
print(big.SIIT)
plt <- ggplot(tmp, aes_string(SIIT)) +
           geom_histogram(aes(y = ..density..), binwidth=0.05) +
           geom_vline(xintercept=1.15, linetype="longdash", size=1) +
           theme_bw() + facet_grid(Smarthouse ~.)
print(plt)
plt <- ggplot(tmp, aes_string(x="Smarthouse", y=SIIT)) +
           geom_boxplot() + theme_bw()
print(plt)
remove(tmp)

## End(Not run)

[Package imageData version 0.1-62 Index]