SlicedData-class {MatrixEQTL} | R Documentation |
Class SlicedData
for storing large matrices
Description
This class is created for fast and memory efficient manipulations with large datasets presented in matrix form. It is used to load, store, and manipulate large datasets, e.g. genotype and gene expression matrices. When a dataset is loaded, it is sliced in blocks of 1,000 rows (default size). This allows imputing, standardizing, and performing other operations with the data with minimal memory overhead.
Usage
# x[[i]] indexing allows easy access to individual slices.
# It is equivalent to x$GetSlice(i) and x$SetSlice(i,value)
## S4 method for signature 'SlicedData'
x[[i]]
## S4 replacement method for signature 'SlicedData'
x[[i]] <- value
# The following commands work as if x was a simple matrix object
## S4 method for signature 'SlicedData'
nrow(x)
## S4 method for signature 'SlicedData'
ncol(x)
## S4 method for signature 'SlicedData'
dim(x)
## S4 method for signature 'SlicedData'
rownames(x)
## S4 method for signature 'SlicedData'
colnames(x)
## S4 replacement method for signature 'SlicedData'
rownames(x) <- value
## S4 replacement method for signature 'SlicedData'
colnames(x) <- value
# SlicedData object can be easily transformed into a matrix
# preserving row and column names
## S4 method for signature 'SlicedData'
as.matrix(x)
# length(x) can be used in place of x$nSlices()
# to get the number of slices in the object
## S4 method for signature 'SlicedData'
length(x)
Arguments
x |
|
i |
Number of a slice. |
value |
New content for the slice / new row or column names. |
Extends
SlicedData
is a reference classes (envRefClass
).
Its methods can change the values of the fields of the class.
Fields
dataEnv
:-
environment
. Stores the slices of the data matrix. The slices should be accessed viagetSlice()
andsetSlice()
methods. nSlices1
:-
numeric
. Number of slices. For internal use. The value should be access vianSlices()
method. rowNameSlices
:-
list
. Slices of row names. columnNames
:-
character
. Column names. fileDelimiter
:-
character
. Delimiter separating values in the input file. fileSkipColumns
:-
numeric
. Number of columns with row labels in the input file. fileSkipRows
:-
numeric
. Number of rows with column labels in the input file. fileSliceSize
:-
numeric
. Maximum number of rows in a slice. fileOmitCharacters
:-
character
. Missing value (NaN) representation in the input file.
Methods
initialize(mat)
:-
Create the object from a matrix.
nSlices()
:-
Returns the number of slices.
nCols()
:-
Returns the number of columns in the matrix.
nRows()
:-
Returns the number of rows in the matrix.
Clear()
:-
Clears the object. Removes the data slices and row and column names.
Clone()
:-
Makes a copy of the object. Changes to the copy do not affect the source object.
CreateFromMatrix(mat)
:-
Creates
SlicedData
object from amatrix
. LoadFile(filename, skipRows = NULL, skipColumns = NULL,
sliceSize = NULL, omitCharacters = NULL, delimiter = NULL, rowNamesColumn = 1)
:-
Loads data matrix from a file.
filename
should be a character string. The remaining parameters specify the file format and have the same meaning asfile*
fields. AdditionalrowNamesColumn
parameter specifies which of the columns of row labels to use as row names. SaveFile(filename)
:-
Saves the data to a file.
filename
should be a character string. getSlice(sl)
:-
Retrieves
sl
-th slice of the matrix. setSlice(sl, value)
:-
Set
sl
-th slice of the matrix. ColumnSubsample(subset)
:-
Reorders/subsets the columns according to
subset
.
Acts asM = M[ ,subset]
for a matrixM
. RowReorder(ordr)
:-
Reorders rows according to
ordr
.
Acts asM = M[ordr, ]
for a matrixM
. RowMatrixMultiply(multiplier)
:-
Multiply each row by the
multiplier
.
Acts asM = M %*% multiplier
for a matrixM
. CombineInOneSlice()
:-
Combines all slices into one. The whole matrix can then be obtained via
$getSlice(1)
. IsCombined()
:-
Returns
TRUE
if the number of slices is 1 or 0. ResliceCombined(sliceSize = -1)
:-
Cuts the data into slices of
sliceSize
rows. IfsliceSize
is not defined, the value offileSliceSize
field is used. GetAllRowNames()
:-
Returns all row names in one vector.
RowStandardizeCentered()
:-
Set the mean of each row to zero and the sum of squares to one.
SetNanRowMean()
:-
Impute rows with row mean. Rows full of NaN values are imputed with zeros.
RowRemoveZeroEps()
:-
Removes rows of zeros and those that are nearly zero.
FindRow(rowname)
:-
Finds row by name. Returns a pair of slice number an row number within the slice. If no row is found, the function returns
NULL
. rowMeans(x, na.rm = FALSE, dims = 1L)
:-
Returns a vector of row means. Works as rowMeans but requires
dims
to be equal to1L
. rowSums(x, na.rm = FALSE, dims = 1L)
:-
Returns a vector of row sums. Works as rowSums but requires
dims
to be equal to1L
. colMeans(x, na.rm = FALSE, dims = 1L)
:-
Returns a vector of column means. Works as colMeans but requires
dims
to be equal to1L
. colSums(x, na.rm = FALSE, dims = 1L)
:-
Returns a vector of column sums. Works as colSums but requires
dims
to be equal to1L
.
Author(s)
Andrey A Shabalin andrey.shabalin@gmail.com
References
The package website: http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/
See Also
This class is used to load data for eQTL analysis by
Matrix_eQTL_engine
.
Examples
# Create a SlicedData variable
sd = SlicedData$new()
# Show the details of the empty object
show(sd)
# Create a matrix of values and assign to sd
mat = matrix(1:12, 3, 4)
rownames(mat) = c("row1", "row2", "row3")
colnames(mat) = c("col1", "col2", "col3", "col4")
sd$CreateFromMatrix( mat )
# Show the detail of the object (one slice)
show(sd)
# Slice it in pieces of 2 rows
sd$ResliceCombined(sliceSize = 2L)
# Show the number of slices (equivalent function calls)
sd$nSlices()
length(sd)
# Is it all in one slice? (No)
sd$IsCombined()
# Show the column names (equivalent function calls)
sd$columnNames
colnames(sd)
# Show row name slices
sd$rowNameSlices
# Show all row names (equivalent function calls)
sd$GetAllRowNames()
rownames(sd)
# Print the second slice
print(sd[[2]])
# Reorder and subset columns
sd$ColumnSubsample( c(1,3,4) )
# Reorder and subset rows
sd$RowReorder( c(3,1) )
# Show the detail of the object (one slice again)
show(sd)
# Is it all in one slice? (Yes)
sd$IsCombined()
# Find the row with name "row1" (it is second in the first slice)
sd$FindRow("row1")