buildPhylipLineage {alakazam} | R Documentation |
Infer an Ig lineage using PHYLIP
Description
buildPhylipLineage
reconstructs an Ig lineage via maximum parsimony using the
dnapars application, or maximum liklihood using the dnaml application of the PHYLIP package.
Usage
buildPhylipLineage(
clone,
phylip_exec,
dist_mat = getDNAMatrix(gap = 0),
rm_temp = FALSE,
verbose = FALSE,
temp_path = NULL,
onetree = FALSE,
branch_length = c("mutations", "distance")
)
Arguments
clone |
ChangeoClone object containing clone data. |
phylip_exec |
absolute path to the PHYLIP dnapars executable. |
dist_mat |
character distance matrix to use for reassigning edge weights.
Defaults to a Hamming distance matrix returned by getDNAMatrix
with |
rm_temp |
if |
verbose |
if |
temp_path |
specific path to temp directory if desired. |
onetree |
if |
branch_length |
specifies how to define branch lengths; one of |
Details
buildPhylipLineage
builds the lineage tree of a set of unique Ig sequences via
maximum parsimony through an external call to the dnapars application of the PHYLIP
package. dnapars is called with default algorithm options, except for the search option,
which is set to "Rearrange on one best tree". The germline sequence of the clone is used
for the outgroup.
Following tree construction using dnapars, the dnapars output is modified to allow
input sequences to appear as internal nodes of the tree. Intermediate sequences
inferred by dnapars are replaced by children within the tree having a Hamming distance
of zero from their parent node. With the default dist_mat
, the distance calculation
allows IUPAC ambiguous character matches, where an ambiguous character has distance zero
to any character in the set of characters it represents. Distance calculation and movement of
child nodes up the tree is repeated until all parent-child pairs have a distance greater than zero
between them. The germline sequence (outgroup) is moved to the root of the tree and
excluded from the node replacement processes, which permits the trunk of the tree to be
the only edge with a distance of zero. Edge weights of the resultant tree are assigned
as the distance between each sequence.
Value
An igraph graph
object defining the Ig lineage tree. Each unique input
sequence in clone
is a vertex of the tree, with additional vertices being
either the germline (root) sequences or inferred intermediates. The graph
object has the following attributes.
Vertex attributes:
-
name
: value in thesequence_id
column of thedata
slot of the inputclone
for observed sequences. The germline (root) vertex is assigned the name "Germline" and inferred intermediates are assigned names with the format "Inferred1", "Inferred2", .... -
sequence
: value in thesequence
column of thedata
slot of the inputclone
for observed sequences. The germline (root) vertex is assigned the sequence in thegermline
slot of the inputclone
. The sequence of inferred intermediates are extracted from the dnapars output. -
label
: same as thename
attribute.
Additionally, each other column in the data
slot of the input
clone
is added as a vertex attribute with the attribute name set to
the source column name. For the germline and inferred intermediate vertices,
these additional vertex attributes are all assigned a value of NA
.
Edge attributes:
-
weight
: Hamming distance between thesequence
attributes of the two vertices. -
label
: same as theweight
attribute.
Graph attributes:
-
clone
: clone identifier from theclone
slot of the inputChangeoClone
. -
v_gene
: V-segment gene call from thev_gene
slot of the inputChangeoClone
. -
j_gene
: J-segment gene call from thej_gene
slot of the inputChangeoClone
. -
junc_len
: junction length (nucleotide count) from thejunc_len
slot of the inputChangeoClone
.Alternatively, this function will return an
phylo
object, which is compatible with the ape package. This object will contain reconstructed ancestral sequences innodes
attribute.
References
Felsenstein J. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989 5:164-166.
Stern JNH, Yaari G, Vander Heiden JA, et al. B cells populating the multiple sclerosis brain mature in the draining cervical lymph nodes. Sci Transl Med. 2014 6(248):248ra107.
See Also
Takes as input a ChangeoClone.
Temporary directories are created with makeTempDir.
Distance is calculated using seqDist.
See [igraph](http://www.rdocumentation.org/packages/igraph/topics/aaa-igraph-package)
and [igraph.plotting](http://www.rdocumentation.org/packages/igraph/topics/plot.common)
for working with igraph graph
objects.
Examples
## Not run:
# Preprocess clone
db <- subset(ExampleDb, clone_id == 3138)
clone <- makeChangeoClone(db, text_fields=c("sample_id", "c_call"),
num_fields="duplicate_count")
# Run PHYLIP and process output
phylip_exec <- "~/apps/phylip-3.695/bin/dnapars"
graph <- buildPhylipLineage(clone, phylip_exec, rm_temp=TRUE)
# Plot graph with a tree layout
library(igraph)
plot(graph, layout=layout_as_tree, vertex.label=V(graph)$c_call,
vertex.size=50, edge.arrow.mode=0, vertex.color="grey80")
# To consider each indel event as a mutation, change the masking character
# and distance matrix
clone <- makeChangeoClone(db, text_fields=c("sample_id", "c_call"),
num_fields="duplicate_count", mask_char="-")
graph <- buildPhylipLineage(clone, phylip_exec, dist_mat=getDNAMatrix(gap=-1),
rm_temp=TRUE)
## End(Not run)