ghap.compress {GHap}R Documentation

Compress phased genotype data

Description

This function takes phased genotype data and converts them into a compressed binary format.

Usage

  ghap.compress(input.file = NULL, out.file,
                samples.file = NULL, markers.file = NULL,
                phase.file = NULL, batchsize = NULL,
                ncores = 1, verbose = TRUE)

Arguments

If all input files share the same prefix, the user can use the following shortcut options:

input.file

Prefix for input files.

out.file

Output file name.

For backward compatibility, the user can still point to input files separately:

samples.file

Individual information.

markers.file

Variant map information.

phase.file

Phased genotype matrix.

To turn compression progress-tracking on or off, or to control parallelization of the task please use:

batchsize

A numeric value controlling the number of markers to be compressed and written to output at a time (default = nmarkers/10).

ncores

A numeric value specifying the number of cores to be used in parallel computing (default = 1).

verbose

A logical value specfying whether log messages should be printed (default = TRUE).

Details

The supported input format is composed of three files with suffix:

The function outputs a binary file with suffix .phaseb. Each allele is stored as a bit in that file. Bits for any given marker are arranged in a sequence of bytes. Since each marker requires storage of 2*nsamples bits, the number of bytes consumed by a single marker in the output file is ceiling(2*nsamples). If the number of alleles is not a multiple of 8, bits in the remainder of the last byte are filled with 0. All functions in GHap were carefully designed to decode the bytes of a marker in such a way that trailing bits are ignored if present.

Author(s)

Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>

Examples

 
# #### DO NOT RUN IF NOT NECESSARY ###
# 
# # Copy the example data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "raw",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# ### RUN ###
# 
# # Compress phase data using prefix
# ghap.compress(input.file = "example",
#               out.file = "example")
# 
# # Compress phase data using file names
# ghap.compress(samples.file = "example.samples",
#               markers.file = "example.markers",
#               phase.file = "example.phase",
#               out.file = "example")


[Package GHap version 3.0.0 Index]