fixCSV {PolyPatEx} | R Documentation |
Tidy a comma separated value (CSV) file
Description
Tidies up a Comma Separated Value (CSV) file, ensuring that each row of the table in the file contains the same number of commas, and no empty rows are left below the table.
Usage
fixCSV(file, skip = 0, overwrite = FALSE)
Arguments
file |
character: the name of the CSV file to be ‘fixed’. |
skip |
integer: the number of lines in the CSV file to skip
before the header row of the table. The skipped lines are copied
directly to the output file unchanged. The default is
|
overwrite |
logical: Write output to a separate,
‘FIXED’ file ( |
Details
fixCSV
tidies up a Comma Separated Value (CSV) file
to ensure that the CSV file contains a strictly rectangular block
of data for input into R (ignoring any preliminary comment rows
via the skip=
argument).
CSV formatted files are a plain text file format for tabular data, in which cell entries in the same row of a table are separated by commas. When such files are exported from other applications such as spreadsheet software, the software has to decide whether any empty cells to the right-hand side of, or below, the table or spreadsheet should be represented by trailing commas in the CSV file. Such decisions can result in a ‘ragged’ table in the CSV file, in which some rows contain fewer commas (‘short rows’) or more commas (‘long rows’) than others, or where empty rows below the table are included as comma-only rows in the CSV file.
While R's read.table
and related functions can
sensibly extend short rows as needed, ragged tables in a CSV file
can still result in errors, unwanted empty rows (below the table)
or unwanted columns (to the right of the table) when the data is
loaded into R.
fixCSV
reads in a specified CSV file and removes or adds
commas to rows, to ensure that each row in the body of the table
contains the same number of cells as the header row of the table.
Any empty rows below the table are also removed. The resulting
table is then written back to file, either to a new file with
‘FIXED’ added to the filename (argument
overwrite=FALSE
, the default) or overwriting the original
file (overwrite=TRUE
- the original file is copied to a
.BAK
file before being overwritten).
Note that:
The table of data in the CSV file must contain a header row of the correct length, since this row is used to determine the correct number of columns for the table. Note: if this header row is too short, then subsequent rows will be truncated to match the length of the header, so beware. Misspecification of the
skip=
argument (see below) can similarly lead to such corruption of the ‘fixed’ file.In the header row, any trailing commas representing empty cells to the right of the (non-empty) header entries are first removed before determining the correct number of columns for the table. Thus the length of the header row (and hence the assumed width of the entire table) is determined by the right-most non-empty cell in the header row.
-
fixCSV
does not remove empty cells, rows or columns within the interior (or on the left side) of the table - it is concerned only with the right and bottom boundaries of the table. A
skip=
argument is included to tellfixCSV
to ignore the specified number of comment rows preceding the header row. Such rows are simply copied over into the output file unchanged. The default for this parameter isskip=0
, so that the first row in the data file is assumed to be the header row. As noted above, misspecification of this argument can seriously corrupt the output.-
fixCSV
can overwrite your data file(s) (viaoverwrite=TRUE
), and althought it makes a backup of your original file, you should still make sure that you have a separate backup of your data file in a safe place before using this function! The author of this code takes no responsibility for any data loss or corruption as a result of the use of this routine...
Author(s)
Alexander Zwart (alec.zwart at csiro.au)
Examples
## Not run:
## Assuming CSV file 'alleleDataFile.csv' exists in the current
## directory. The following overwrites the CSV file - make sure
## you have a backup!
fixCSV("alleleDataFile.csv",overwrite=TRUE)
## End(Not run)