fixCSV {PolyPatEx}R Documentation

Tidy a comma separated value (CSV) file

Description

Tidies up a Comma Separated Value (CSV) file, ensuring that each row of the table in the file contains the same number of commas, and no empty rows are left below the table.

Usage

fixCSV(file, skip = 0, overwrite = FALSE)

Arguments

file

character: the name of the CSV file to be ‘fixed’.

skip

integer: the number of lines in the CSV file to skip before the header row of the table. The skipped lines are copied directly to the output file unchanged. The default is skip=0, implying that the header row is the first row of the CSV file.

overwrite

logical: Write output to a separate, ‘FIXED’ file (overwrite=FALSE, the default), or overwrite the original file (overwrite=TRUE)? If overwrite=TRUE, the original file is copied to a .BAK file before being overwritten.

Details

fixCSV tidies up a Comma Separated Value (CSV) file to ensure that the CSV file contains a strictly rectangular block of data for input into R (ignoring any preliminary comment rows via the skip= argument).

CSV formatted files are a plain text file format for tabular data, in which cell entries in the same row of a table are separated by commas. When such files are exported from other applications such as spreadsheet software, the software has to decide whether any empty cells to the right-hand side of, or below, the table or spreadsheet should be represented by trailing commas in the CSV file. Such decisions can result in a ‘ragged’ table in the CSV file, in which some rows contain fewer commas (‘short rows’) or more commas (‘long rows’) than others, or where empty rows below the table are included as comma-only rows in the CSV file.

While R's read.table and related functions can sensibly extend short rows as needed, ragged tables in a CSV file can still result in errors, unwanted empty rows (below the table) or unwanted columns (to the right of the table) when the data is loaded into R.

fixCSV reads in a specified CSV file and removes or adds commas to rows, to ensure that each row in the body of the table contains the same number of cells as the header row of the table. Any empty rows below the table are also removed. The resulting table is then written back to file, either to a new file with ‘FIXED’ added to the filename (argument overwrite=FALSE, the default) or overwriting the original file (overwrite=TRUE - the original file is copied to a .BAK file before being overwritten).

Note that:

Author(s)

Alexander Zwart (alec.zwart at csiro.au)

Examples

## Not run: 

## Assuming CSV file 'alleleDataFile.csv' exists in the current
## directory.  The following overwrites the CSV file - make sure
## you have a backup!

fixCSV("alleleDataFile.csv",overwrite=TRUE)


## End(Not run)

[Package PolyPatEx version 0.9.2 Index]