align {edlibR} | R Documentation |
Align query with target using edit distance
Description
Align query with target using edit distance
Usage
align(
query,
target,
mode = "NW",
task = "distance",
k = -1,
cigarFormat = "extended",
additionalEqualities = NULL
)
Arguments
query |
character string Combined with target must have no more than 256 unique values |
target |
character string Combined with query must have no more than 256 unique values |
mode |
character string (default="NW") Alignment method to be used. Possible values are: - 'NW' for global (default). Note that 'NW' stands for 'Needleman-Wunsch'. - 'HW' for infix. Note that 'HW' stands for 'Hybrid Wunsch'. - 'SHW' for prefix. Note that 'SHW' stands for 'Semi-Hybrid Wunsch'. |
task |
character string (default="distance") Specifies what to calculate. The less there is to calculate, the faster it is. Possible options are (ranked from fastest to slowest): - 'distance': Find the edit distance and the end locations in the target (default). - 'locations': Find the edit distance, the end locations, and the start locations. - 'path': Find the edit distance, the start and end locations, and the alignment path. |
k |
integer (default=-1) Max edit distance to search for — the lower this value, the faster the calculation. Set to -1 (default) to have no limit on edit distance. |
cigarFormat |
character string (default="extended") Specifies which format to use for writing out the CIGAR string. The two possible values are 'standard' and 'extended' (Note: the function getNiceAlignment() only accepts 'cigarFormat="extended"'): - 'standard': Standard uses the following symbols to generate a CIGAR string: Match: 'M', Insertion: 'I', Deletion: 'D', Mismatch: 'M'. Note that 'M' in this setting can denote either a sequence match or mismatch. - 'extended': Extended uses the following symbols to generate a CIGAR string: Match: '=', Insertion to target: 'I', Deletion from target: 'D', Mismatch: 'X'. e.g. CIGAR of "5=1X1=1I" means "5 matches, 1 mismatch, 1 match, 1 insertion (to target)". For more details on the CIGAR format, please check <http://samtools.github.io/hts-specs/SAMv1.pdf> and <http://drive5.com/usearch/manual/cigar.html>. |
additionalEqualities |
List of vectors contains pairs of characters (default=NULL) Allows users to extend the definition of equality used in the alignment. The input 'additionalEqualities' must be a list of character vectors whereby each character vector contains a pair of character strings. (NOTE: the character vectors must contain exactly two strings, a pair.) Each pair defines two values as equal. This can be useful e.g. when you want edlib to be case insensitive, or if you want certain characters to act as wildcards. If NULL, there will be no additional extensions to edlib's default equality definition. |
Value
List with the following fields: - editDistance: (integer) The edit distance. This is set to -1 if it is larger than k. - alphabetLength: (integer) Length of unique characters in 'query' and 'target' - locations: (list of vectors) List of R vectors of locations, in the format list(c(start, end)). Note: if the start or end positions are NULL, this is encoded as 'NA' to work correctly with R vectors. - cigar: (character string) CIGAR is a standard format for the alignment path. - cigarFormat: (character string) Format provided by the parameter 'cigarFormat' in the function align() which is returned here for the function getNiceAlignment(). (Note: the function getNiceAlignment() only accepts 'extended')
Examples
align("ACTG", "CACTRT", mode="HW", task="path")
align("elephant", "telephone")
align("ACTG", "CACTRT", mode="HW", task="path", additionalEqualities=list(c("R", "A"), c("R", "G")))