R: Aids in Processing and Plotting Data from a Lemna-Tec...

imageData-package {imageData}

R Documentation

Aids in Processing and Plotting Data from a Lemna-Tec Scananalyzer

Description

Note that 'imageData' has been superseded by 'growthPheno'. The package 'growthPheno' incorporates all the functionality of 'imageData' and has functionality not available in 'imageData', but some 'imageData' functions have been renamed. The 'imageData' package is no longer maintained, but is retained for legacy purposes.

Version: 0.1-62

Date: 2023-08-21

Index

For an overview of the use of these functions and an example see below.

(i) Data

`RiceRaw.dat`	Data for an experiment to investigate a rice
germplasm panel.

(ii) Data frame manipulation

`designFactors`	Adds the factors and covariates for a blocked,
	split-plot design.
`getDates`	Forms a subset of 'responses' in 'data' that
	contains their values for the nominated times.
`importExcel`	Imports an Excel imaging file and allows some
	renaming of variables.
`longitudinalPrime`	Selects a set variables to be retained in a
	data frame of longitudinal data.
`twoLevelOpcreate`	Creates a data.frame formed by applying, for
	each response, abinary operation to the values of
	two different treatments.

(iii) Plots

`anomPlot`	Identifies anomalous individuals and produces
	longitudinal plots without them and with just them.
`corrPlot`	Calculates and plots correlation matrices for a
	set of responses.
`imagetimesPlot`	Plots the time within an interval versus the interval.
	For example, the hour of the day carts are imaged
	against the days after planting (or some other
	number of days after an event).
`longiPlot`	Plots longitudinal data from a Lemna Tec
	Scananalyzer.
`probeDF`	Compares, for a set of specified values of df,
	a response and the smooths of it, possibly along
	with growth rates calculated from the smooths.

(iv) Calculations value-by-value

`GrowthRates`	Calculates growth rates (AGR, PGR, RGRdiff)
	between pairs of values in a vector.
`WUI`	Calculates the Water Use Index (WUI).
`anom`	Tests if any values in a vector are anomalous
	in being outside specified limits.
`calcTimes`	Calculates for a set of times, the time intervals
	after an origin time and the position of each with
	in that time.
`calcLagged`	Replaces the values in a vector with the result
	of applying an operation to it and a lagged value.
`cumulate`	Calculates the cumulative sum, ignoring the
	first element if exclude.1st is TRUE.

(v) Calculations over multiple values

`fitSpline`	Produce the fits from a natural cubic smoothing
	spline applied to a response in a 'data.frame'.
`intervalGRaverage`	Calculates the growth rates for a specified
	time interval by taking weighted averages of
	growth rates for times within the interval.
`intervalGRdiff`	Calculates the growth rates for a specified
	time interval.
`intervalValueCalculate`	Calculates a single value that is a function of
	an individual's values for a response over a
	specified time interval.
`intervalWUI`	Calculates water use indices (WUI) over a
	specified time interval to a data.frame.

(vi) Caclulations in each split of a 'data.frame'

`splitContGRdiff`	Adds the growth rates calculated continuously
	over time for subsets of a response to a
	'data.frame'.
`splitSplines`	Adds the fits after fitting a natural cubic
	smoothing spline to subsets of a response to a
	'data.frame'.
`splitValueCalculate`	Calculates a single value that is a function of
	an individual's values for a response.

(vii) Principal variates analysis (PV A)

`intervalPVA`	Selects a subset of variables observed within a
	specified time interval using PVA.
`PVA`	Selects a subset of variables using PVA.
`rcontrib`	Computes a measure of how correlated each
	variable in a set is with the other variable,
	conditional on a nominated subset of them.

Overview

This package can be used to carry out a full seven-step process to produce phenotypic traits from measurements made in a high-throughput phenotyping facility, such as one based on a Lemna-Tec Scananalyzer 3D system and described by Al-Tamimi et al. (2016). Otherwise, individual functions can be used to carry out parts of the process.

The basic data consists of imaging data obtained from a set of pots or carts over time. The carts are arranged in a grid of Lanes \times Positions. There should be a unique identifier for each cart, which by default is Snapshot.ID.Tag, and variable giving the Days after Planting for each measurement, by default Time.after.Planting..d.. In some cases, it is expected that there will be a column labelled Snapshot.Time.Stamp, which reflects the time of the imaging from which a particular data value was obtained.

The full seven-step process is as follows:

Use importExcel to import the raw data from the Excel file. This step should also involve any editing of the data needed to take account of mishaps during the data collection and the need to remove faulty data (produces raw.dat). Generally, data can be removed by replacing only values for responses with missing values (NA) for carts whose data is to be removed, leaving the identifying information intact.
Use longitudinalPrime to select a subset of the imaging variables produced by the Lemna Tec Scanalyzer and, if the design is a blocked, split-plot design, use designFactors to add covariates and factors that might be used in the analysis (produces the data frame longi.prime.dat).
Add derived traits that result in a value for each observation: use splitContGRdiff to obtain continuous growth rates i.e. a growth rate for each time of observation, except the first; WUI to produce continuous Water Use Efficiency Indices (WUE) and cumulate to produce cumulative responses. (Produces the data frame longi.dat.)
Use splitSplines to fit splines to smooth the longitudinal trends in the primary traits and calculate continuous growth rates from the smoothed response (added to the data frame longi.dat). There are two options for calculating continuous smoothed growth rates: (i) by differencing — use splitContGRdiff; (ii) from the first derivatives of the splines — in splitSplines include 1 in the deriv argument, include "AGR" in suffices.deriv and set the RGR to say "RGR". Optionally, use probeDF to compare the smooths for a number of values of df and, if necessary, re-run splitSplines with a revised value of df.
Perform an exploratory examination of the unsmoothed data by using longiPlot to produce longitudinal plots of unsmoothed imaging traits and continuous growth rates. Also, use longiPlot to plot the smoothed imaging traits and continuous growth rates and anomPlot to check for anomalies in the data.
Produce cart data: traits for which there is a single value for each Snapshot.ID.Tag or cart. (produces the data frame cart.dat)
1. Set up a cart data.frame with the factors and covariates for a single observation from all carts. This can be done by subsetting longi.dat so that there is one entry for each cart.
2. Use getDates to add traits at specific times to the cart data.frame, often the first and last day of imaging for each Snapshot.ID.Tag. The times need to be selected so that there is one and only one observation for each cart. Also form traits, such as growth rates over the whole imaging period, based on these values
3. Based on the longitudinal plots, decide on the intervals for which growth rates and WUEs are to be calculated. The growth rates for intervals are calculated from the continuous growth rates, using intervalGRdiff, if the continuous growth rates were calculated by differencing, or intervalGRaverage, if they were calculated from first derivatives. To calculate WUEs for intervals, use intervalWUI, The interval growth rates and WUEs are added to the cart data.frame.
(Optional) There is also the possibility that, for experiments investigating salinity, the Shoot Ion Independent Tolerance (SIIT) index can be calculated using twoLevelOpcreate.

Author(s)

Maintainer: NA

References

Al-Tamimi, N, Brien, C.J., Oakey, H., Berger, B., Saade, S., Ho, Y. S., Schmockel, S. M., Tester, M. and Negrao, S. (2016) New salinity tolerance loci revealed in rice using high-throughput non-invasive phenotyping. Nature Communications, 7, 13342.

Examples

## Not run: 
### This example can be run because the data.frame RiceRaw.dat is available with the package
#'# Step 1: Import the raw data
data(RiceRaw.dat)

#'# Step 2: Select imaging variables and add covariates and factors (produces longi.dat)
longi.dat <- longitudinalPrime(data=RiceRaw.dat, smarthouse.lev=c("NE","NW"))

longi.dat <- designFactors(longi.dat, insertName = "xDays",
                           designfactorMethod="StandardOrder")

#'## Particular edits to longi.dat
longi.dat <- within(longi.dat, 
                    { 
                      Days.after.Salting <- as.numfac(Days) - 29
                    })
longi.dat <- with(longi.dat, longi.dat[order(Snapshot.ID.Tag,Days), ])

#'# Step 3: Form derived traits that result in a value for each observation
#'### Set responses
responses.image <- c("Area")
responses.smooth <- paste(responses.image, "smooth", sep=".")

#'## Form growth rates for each observation of a subset of responses by differencing
longi.dat <- splitContGRdiff(longi.dat, responses.image, 
                             INDICES="Snapshot.ID.Tag",
                             which.rates = c("AGR","RGR"))

#'## Form Area.WUE 
longi.dat <- within(longi.dat, 
                    { 
                      Area.WUE <- WUI(Area.AGR*Days.diffs, Water.Loss)
                    })

#'## Add cumulative responses 
longi.dat <- within(longi.dat, 
                    { 
                      Water.Loss.Cum <- unlist(by(Water.Loss, Snapshot.ID.Tag, 
                                                  cumulate, exclude.1st=TRUE))
                      WUE.cum <- Area / Water.Loss.Cum 
                    })

#'# Step 4: Fit splines to smooth the longitudinal trends in the primary traits and
#'# calculate their growth rates
#'
#'## Smooth responses
#+
for (response in c(responses.image, "Water.Loss"))
  longi.dat <- splitSplines(longi.dat, response, x="xDays", INDICES = "Snapshot.ID.Tag", 
                            df = 4, na.rm=TRUE)
longi.dat <- with(longi.dat, longi.dat[order(Snapshot.ID.Tag, xDays), ])

#'## Loop over smoothed responses, forming growth rates by differences
#+
responses.GR <- paste(responses.smooth, "AGR", sep=".")
longi.dat <- splitContGRdiff(longi.dat, responses.smooth, 
                             INDICES="Snapshot.ID.Tag",
                             which.rates = c("AGR","RGR"))

#'## Finalize longi.dat
longi.dat <- with(longi.dat, longi.dat[order(Snapshot.ID.Tag, xDays), ])

#'# Step 5: Do exploratory plots on unsmoothed and smoothed longitudinal data
responses.longi <- c("Area","Area.AGR","Area.RGR", "Area.WUE")
responses.smooth.plot <- c("Area.smooth","Area.smooth.AGR","Area.smooth.RGR")
titles <- c("Total area (1000 pixels)", 
            "Total area AGR (1000 pixels per day)", "Total area RGR (per day)",
            "Total area WUE (1000 pixels per mL)")
titles.smooth<-titles
nresp <- length(responses.longi)
limits <- list(c(0,1000), c(-50,125), c(-0.05,0.40), c(0,30))

#' ### Plot unsmoothed profiles for all longitudinal  responses 
#+ "01-ProfilesAll"
klimit <- 0
for (k in 1:nresp)
{ 
  klimit <- klimit + 1
  longiPlot(data = longi.dat, response = responses.longi[k], 
            y.title = titles[k], x="xDays+35.42857143", 
            ggplotFuncs = list(geom_vline(xintercept=29, linetype="longdash", size=1), 
                               scale_x_continuous(breaks=seq(28, 42, by=2)),
                               scale_y_continuous(limits=limits[[klimit]])))
}


#' ### Plot smoothed profiles for all longitudinal responses - GRs by difference
#+ "01-SmoothedProfilesAll"
nresp.smooth <- length(responses.smooth.plot)
limits <- list(c(0,1000), c(0,100), c(0.0,0.40))
for (k in 1:nresp.smooth)
{ 
  longiPlot(data = longi.dat, response = responses.smooth.plot[k], 
            y.title = titles.smooth[k], x="xDays+35.42857143", 
            ggplotFuncs = list(geom_vline(xintercept=29, linetype="longdash", size=1), 
                               scale_x_continuous(breaks=seq(28, 42, by=2)),
                               scale_y_continuous(limits=limits[[klimit]])))
  print(plt)
}


#'### AGR anomalies - plot without anomalous plants followed by plot of anomalous plants
#+ "01-0254-AGRanomalies"
anom.ID <- vector(mode = "character", length = 0L)
response <- "Area.smooth.AGR"
cols.output <- c("Snapshot.ID.Tag", "Smarthouse", "Lane", "Position", 
                 "Treatment.1", "Genotype.ID", "Days")
anomalous <- anomPlot(longi.dat, response=response, lower=2.5, start.time=40, 
                      x = "xDays+35.42857143", vertical.line=29, breaks=seq(28, 42, by=2), 
                      whichPrint=c("innerPlot"), y.title=response)
subs <- subset(anomalous$data, Area.smooth.AGR.anom & Days==42)
if (nrow(subs) == 0)
{ cat("\n#### No anomalous data here\n\n")
} else
{ 
  subs <- subs[order(subs["Smarthouse"],subs["Treatment.1"], subs[response]),]
  print(subs[c(cols.output, response)])
  anom.ID <- unique(c(anom.ID, subs$Snapshot.ID.Tag))
  outerPlot <- anomalous$outerPlot  + geom_text(data=subs,
                                                aes_string(x = "xDays+35.42857143", 
                                                           y = response, 
                                                           label="Snapshot.ID.Tag"), 
                                                size=3, hjust=0.7, vjust=0.5)
  print(outerPlot)
}


#'# Step 6: Form single-value plant responses in Snapshot.ID.Tag order.
#'
#'## 6a) Set up a data frame with factors only
#+
cart.dat <- longi.dat[longi.dat$Days == 31, 
                      c("Smarthouse","Lane","Position","Snapshot.ID.Tag",
                        "xPosn","xMainPosn",
                        "Zones","xZones","SHZones","ZLane","ZMainplots", "Subplots",
                        "Genotype.ID","Treatment.1")]
cart.dat <- cart.dat[do.call(order, cart.dat), ]

#'## 6b) Get responses based on first and last date.
#'
#'### Observation for first and last date
cart.dat <- cbind(cart.dat, getDates(responses.image, data = longi.dat, 
                                     which.times = c(31), suffix = "first"))
cart.dat <- cbind(cart.dat, getDates(responses.image, data = longi.dat, 
                                     which.times = c(42), suffix = "last"))
cart.dat <- cbind(cart.dat, getDates(c("WUE.cum"), 
                                     data = longi.dat, 
                                     which.times = c(42), suffix = "last"))
responses.smooth <- paste(responses.image, "smooth", sep=".")
cart.dat <- cbind(cart.dat, getDates(responses.smooth, data = longi.dat, 
                                     which.times = c(31), suffix = "first"))
cart.dat <- cbind(cart.dat, getDates(responses.smooth, data = longi.dat, 
                                     which.times = c(42), suffix = "last"))

#'### Growth rates over whole period.
#+
tottime <- 42 - 31
cart.dat <- within(cart.dat, 
                   { 
                     Area.AGR <- (Area.last - Area.first)/tottime
                     Area.RGR <- log(Area.last / Area.first)/tottime
                   })

#'### Calculate water index over whole period
cart.dat <- merge(cart.dat, 
                  intervalWUI("Area", water.use = "Water.Loss", 
                              start.times = c(31), 
                              end.times = c(42), 
                              suffix = NULL, 
                              data = longi.dat, include.total.water = TRUE),
                  by = c("Snapshot.ID.Tag"))
names(cart.dat)[match(c("Area.WUI","Water.Loss.Total"),names(cart.dat))] <- 
        c("Area.Overall.WUE", "Water.Loss.Overall")
cart.dat$Water.Loss.rate.Overall <- cart.dat$Water.Loss.Overall / (42 - 31)

#'## 6c) Add growth rates and water indices for intervals
#'### Set up intervals
#+
start.days <- list(31,35,31,38)
end.days <- list(35,38,38,42)
suffices <- list("31to35","35to38","31to38","38to42")

#'### Rates for specific intervals from the smoothed data by differencing
#+
for (r in responses.smooth)
{ for (k in 1:length(suffices))
  { 
    cart.dat <- merge(cart.dat, 
                      intervalGRdiff(r, 
                                     which.rates = c("AGR","RGR"), 
                                     start.times = start.days[k][[1]], 
                                     end.times = end.days[k][[1]], 
                                     suffix.interval = suffices[k][[1]], 
                                     data = longi.dat),
                      by = "Snapshot.ID.Tag")
  }
}

#'### Water indices for specific intervals from the unsmoothed and smoothed data
#+
for (k in 1:length(suffices))
{ 
  cart.dat <- merge(cart.dat, 
                    intervalWUI("Area", water.use = "Water.Loss", 
                                start.times = start.days[k][[1]], 
                                end.times = end.days[k][[1]], 
                                suffix = suffices[k][[1]], 
                                data = longi.dat, include.total.water = TRUE),
                    by = "Snapshot.ID.Tag")
  names(cart.dat)[match(paste("Area.WUI", suffices[k][[1]], sep="."), 
                        names(cart.dat))] <- paste("Area.WUE", suffices[k][[1]], sep=".")
  cart.dat[paste("Water.Loss.rate", suffices[k][[1]], sep=".")] <- 
           cart.dat[[paste("Water.Loss.Total", suffices[k][[1]], sep=".")]] / 
                                           ( end.days[k][[1]] - start.days[k][[1]])
}

cart.dat <- with(cart.dat, cart.dat[order(Snapshot.ID.Tag), ])

#'# Step 7: Form continuous and interval SIITs
#'
#'## 7a) Calculate continuous
#+
cols.retained <-  c("Snapshot.ID.Tag","Smarthouse","Lane","Position",
                    "Days","Snapshot.Time.Stamp", "Hour", "xDays",
                    "Zones","xZones","SHZones","ZLane","ZMainplots",
                    "xMainPosn", "Genotype.ID")
responses.GR <- c("Area.smooth.AGR","Area.smooth.AGR","Area.smooth.RGR")
suffices.results <- c("diff", "SIIT", "SIIT")
responses.SIIT <- unlist(Map(paste, responses.GR, suffices.results,sep="."))

longi.SIIT.dat <- 
  twoLevelOpcreate(responses.GR, longi.dat, suffices.treatment=c("C","S"),
                   operations = c("-", "/", "/"), suffices.results = suffices.results, 
                   columns.retained = cols.retained, 
                   by = c("Smarthouse","Zones","ZMainplots","Days"))
longi.SIIT.dat <- with(longi.SIIT.dat, 
                            longi.SIIT.dat[order(Smarthouse,Zones,ZMainplots,Days),])

#' ### Plot SIIT profiles 
#' 
#+ "03-SIITProfiles"
k <- 2
nresp <- length(responses.SIIT)
limits <- with(longi.SIIT.dat, list(c(min(Area.smooth.AGR.diff, na.rm=TRUE),
                                      max(Area.smooth.AGR.diff, na.rm=TRUE)),
                                    c(0,3),
                                    c(0,1.5)))
#Plots
for (k in 1:nresp)
{ 
  longiPlot(data = longi.SIIT.dat, x="xDays+35.42857143", 
            response = responses.SIIT[k], 
            y.title=responses.SIIT[k], 
            facet.x="Smarthouse", facet.y=".", 
            ggplotFuncs = list(geom_vline(xintercept=29, linetype="longdash", size=1), 
                               scale_x_continuous(breaks=seq(28, 42, by=2)),
                               scale_y_continuous(limits=limits[[klimit]])))
}

#'## 7b) Calculate interval SIITs and check for large values for SIIT for Days 31to35
#+ "01-SIITIntClean"
suffices <- list("31to35","35to38","31to38","38to42")
response <- "Area.smooth.RGR.31to35"
SIIT <- paste(response, "SIIT", sep=".")
responses.SIITinterval <- as.vector(outer("Area.smooth.RGR", suffices, paste, sep="."))

cart.SIIT.dat <- twoLevelOpcreate(responses.SIITinterval, cart.dat, 
                                  suffices.treatment=c("C","S"), 
                                  suffices.results="SIIT", 
                                  columns.suffixed="Snapshot.ID.Tag")
tmp<-na.omit(cart.SIIT.dat)
print(summary(tmp[SIIT]))
big.SIIT <- with(tmp, tmp[tmp[SIIT] > 1.15, c("Snapshot.ID.Tag.C","Genotype.ID",
                                              paste(response,"C",sep="."), 
                                              paste(response,"S",sep="."), SIIT)])
big.SIIT <- big.SIIT[order(big.SIIT[SIIT]),]
print(big.SIIT)
plt <- ggplot(tmp, aes_string(SIIT)) +
           geom_histogram(aes(y = ..density..), binwidth=0.05) +
           geom_vline(xintercept=1.15, linetype="longdash", size=1) +
           theme_bw() + facet_grid(Smarthouse ~.)
print(plt)
plt <- ggplot(tmp, aes_string(x="Smarthouse", y=SIIT)) +
           geom_boxplot() + theme_bw()
print(plt)
remove(tmp)

## End(Not run)

[Package imageData version 0.1-62 Index]