read_biotyper_report {maldipickr} | R Documentation |
Importing Bruker MALDI Biotyper CSV report
Description
The header-less table exported by the Compass software in the Bruker MALDI Biotyper device is separated by semi-colons and has empty columns which prevent an easy import in R. This function reads the report correctly as a tibble.
Usage
read_biotyper_report(path, best_hits = TRUE, long_format = TRUE)
Arguments
path |
Path to the semi-colon separated table |
best_hits |
A logical indicating whether to return only the best hits for each target analyzed |
long_format |
A logical indicating whether the table is in the long format (many rows) or wide format (many columns) when showing all the hits. This option has no effect when |
Details
The header-less table contains identification information for each target processed by
the Biotyper device and once processed by the read_biotyper_report
,
the following seven columns are available in the tibble, when using the best_hits = TRUE
option:
-
name
: a character indicating the name of the spot of the MALDI target (i.e., plate) -
sample_name
: the character string provided during the preparation of the MALDI target (i.e., plate) -
hit_rank
: an integer indicating the rank of the hit for the corresponding target and identification -
bruker_quality
: a character encoding the quality of the identification with potentially multiple "+" symbol or only one "-" -
bruker_species
: the species name associated with the MALDI spectrum analyzed. -
bruker_taxid
: the NCBI Taxonomy Identifier of the species name in the column species -
bruker_hash
: a hash from an undocumented checksum function probably to encode the database entry. -
bruker_log
: the log-score of the identification.
When all hits are returned (with best_hits = FALSE
), the default output format is the long format (long_format = TRUE
), meaning that the previous columns remain
unchanged, but all hits are now returned, thus increasing the number of rows.
When all hits are returned (with best_hits = FALSE
) using the wide format (long_format = FALSE), the two columns
nameand
sample_nameremains unchanged, but the five columns prefixed by
bruker_' contain the hit rank, creating a tibble of 52 columns:
-
bruker_01_quality
-
bruker_01_species
-
bruker_01_taxid
-
bruker_01_hash
-
bruker_01_log
-
bruker_02_quality
...
-
bruker_10_species
-
bruker_10_taxid
-
bruker_10_hash
-
bruker_10_log
Value
A tibble of 7 columns (best_hits = TRUE
) or 52 columns (best_hits = FALSE
). See Details for the description of the columns.
Note
A report that contains only spectra with no peaks found will return a tibble of 0 rows and a warning message.
See Also
Examples
# Get a example Bruker report
biotyper <- system.file("biotyper.csv", package = "maldipickr")
# Import the report as a tibble
report_tibble <- read_biotyper_report(biotyper)
# Display the tibble
report_tibble