newCorrespondenceTable {correspondenceTables} | R Documentation |
Ex novo creation of candidate correspondence tables between two classifications via pivot tables
Description
Creation of a candidate correspondence table between two classifications, A and B, when there are
correspondence tables leading from the first classification to the second one via k
intermediate pivot
classifications C_1, \ldots, C_k
. The correspondence tables leading from A to B are A:C_1
, {C_i
:C_{i+1}
: 1 \le i \le k -1
}, B:C_k
.
Usage
newCorrespondenceTable(
Tables,
CSVout = NULL,
Reference = "none",
MismatchTolerance = 0.2
)
Arguments
Tables |
A string of type character containing the name of a csv file which contains the names of the files that contain the classifications and the intermediate correspondence tables (see "Details" below). |
CSVout |
The preferred name for the output csv files that will contain the candidate correspondence table
and information about the classifications involved. The valid values are |
Reference |
The reference classification among A and B. If a classification is the reference to the other, and hence
hierarchically superior to it, each code of the other classification is expected to be mapped to at most one code
of the reference classification. The valid values are |
MismatchTolerance |
The maximum acceptable proportion of rows in the candidate correspondence table which contain
no code for classification A or no code for classification B. The default value is |
Details
File and file name requirements:
The file that corresponds to argument
Tables
and the files to which the contents ofTables
lead, must be in csv format with comma as delimiter. If full paths are not provided, then these files must be available in the working directory. No two filenames provided must be identical.The file that corresponds to argument
Tables
must contain filenames, and nothing else, in a(k+2)
×(k+2)
table, wherek
, a positive integer, is the number of "pivot" classifications. The cells in the main diagonal of the table provide the filenames of the files which contain, with this order, the classifications A,C_1
,\ldots
,C_k
and B. The off-diagonal directly above the main diagonal contains the filenames of the files that contain, with this order, the correspondence tables A:C_1
, {C_i
:C_{i+1}
,1 \le i \le k-1
} and B:C_k
. All other cells of the table must be empty.If any of the two files where the output will be stored is read protected (for instance because it is open elsewhere) an error message will be reported and execution will be halted.
Classification table requirements:
Each of the files that contain classifications must contain at least one column and at least two rows. The first column contains the codes of the respective classification. The first row contains column headers. The header of the first column is the name of the respective classification (e.g., "CN 2021").
The classification codes contained in a classification file (expected in its first column as mentioned above) must be unique. No two identical codes are allowed in the column.
If any of the files that contain classifications has additional columns the first one of them is assumed to contain the labels of the respective classification codes.
Correspondence table requirements:
The files that contain correspondence tables must contain at least two columns and at least two rows. The first column of the file that contains A:
C_1
contains the codes of classification A. The second column contains the codes of classificationC_1
. Similar requirements apply to the files that containC_i
:C_{i+1}
,1 \le i \le k-1
and B:C_k
. The first row of each of the files that contain correspondence tables contains column headers. The names of the first two columns are the names of the respective classifications.The pairs of classification codes contained in a correspondence table file (expected in its first two columns as mentioned above) must be unique. No two identical pairs of codes are allowed in the first two columns.
Interdependency requirements:
At least one code of classification A must appear in both the file of classification A and the file of correspondence table A:
C_1
.At least one code of classification B must appear in both the file of classification B and the file of correspondence table B:
C_k
, wherek
,k\ge 1
, is the number of pivot classifications.If there is only one pivot classification,
C_1
, at least one code of it must appear in both the file of correspondence table A:C_1
and the file of correspondence table B:C_1
.If the pivot classifications are
k
withk\ge 2
then at least one code ofC_1
must appear in both the file of correspondence table A:C_1
and the file of correspondence tableC_1
:C_2
, at least one code of each of theC_i
,i = 2, \ldots, k-1
(ifk\ge 3
) must appear in both the file of correspondence tableC_{i-1}
:C_i
and the file of correspondence tableC_i
:C_{i+1}
, and at least one code ofC_k
must appear in both the file of correspondence tableC_{k-1}
:C_k
and the file of correspondence table B:C_k
.
Mismatch tolerance:
The ratio that is compared with
MismatchTolerance
has as numerator the number of rows in the candidate correspondence table which contain no code for classification A or no code for classification B and as denominator the total number of rows of this table. If the ratio exceedsMismatchTolerance
the execution of the function is halted.
If any of the conditions required from the arguments is violated an error message is produced and execution is stopped.
Value
newCorrespondenceTable()
returns a list with two elements, both of which are data frames.
The first element is the candidate correspondence table A:B, including the codes of all "pivot" classifications, augmented with flags "Review" (if applicable), "Redundancy", "Unmatched", "NoMatchFromA", "NoMatchFromB" and with all the additional columns of the classification and intermediate correspondence table files.
The second element contains the names of classification A, the "pivot" classifications and classification B as read from the top left-hand side cell of the respective input files.
If the value of argument
CSVout
a string of typecharacter
, the elements of the list are exported into files of csv format. The name of the file for the first element is the value of argumentCSVout
and the name of the file for the second element is classificationNames_CSVout
. For example, ifCSVout
= "newCorrespondenceTable.csv", the elements of the list are exported into "newCorrespondenceTable.csv" and "classificationNames_newCorrespondenceTable.csv" respectively.
Explanation of the flags
The "Review" flag is produced only if argument Reference has been set equal to "
A
" or "B
". For each row of the candidate correspondence table, ifReference
= "A
" the value of "Review" is equal to1
if the code of B maps to more than one code of A, and0
otherwise. IfReference
= "B
" the value of "Review" is equal to1
if the code of A maps to more than one code of B, and0
otherwise. The value of the flag is empty if the row does not contain a code of A or a code of B.For each row of the candidate correspondence table, the value of "Redundancy" is equal to
1
if the row contains a combination of codes of A and B that also appears in at least one other row of the candidate correspondence table.For each row of the candidate correspondence table, the value of "Unmatched" is equal to
1
if the row contains a code of A but no code of B or if it contains a code of B but no code of A. The value of the flag is0
if the row contains codes for both A and B.For each row of the candidate correspondence table, the value of "NoMatchFromA" is equal to
1
if the row contains a code of A that appears in the table of classification A but not in correspondence table A:C_1
. The value of the flag is0
if the row contains a code of A that appears in both the table of classification A and correspondencetable A:C_1
. Finally, the value of the flag is empty if the row contains no code of A or if it contains a code of A that appears in correspondence table A:C_1
but not in the table of classification A.For each row of the candidate correspondence table, the value of "NoMatchFromB" is equal to
1
if the row contains a code of B that appears in the table of classification B but not in correspondence table B:C_k
. The value of the flag is0
if the row contains a code of B that appears in both the table of classification B and correspondence table B:C_k
. Finally, the value of the flag is empty if the row contains no code of B or if it contains a code of B that appears in correspondence table B:C_k
but not in the table of classification B.
Sample datasets included in the package
Running browseVignettes("correspondenceTables")
in the console opens an html page in the user's default browser. Selecting HTML from the menu, users can read information about the use of the sample datasets that are included in the package.
If they wish to access the csv files with the sample data, users have two options:
Option 1: Unpack into any folder of their choice the tar.gz file into which the package has arrived. All sample datasets may be found in the "inst/extdata" subfolder of this folder.
Option 2: Go to the "extdata" subfolder of the folder in which the package has been installed in their PC's
R
library. All sample datasets may be found there.
Examples
{
## Application of function newCorrespondenceTable() with "example.csv" being the file
## that includes the names the files and the intermediate tables in a sparse square
## matrix containing the 100 rows of the classifications (from ISIC v4 to CPA v2.1 through
## CPC v2.1). The desired name for the csv file that will contain the candidate
## correspondence table is "newCorrespondenceTable.csv", the reference classification is
## ISIC v4 ("A") and the maximum acceptable proportion of unmatched codes between
## ISIC v4 and CPC v2.1 is 0.56 (this is the minimum mismatch tolerance for the first 100 row
## as 55.5% of the code of ISIC v4 is unmatched).
tmp_dir<-tempdir()
A <- read.csv(system.file("extdata", "example.csv", package = "correspondenceTables"),
header = FALSE,
sep = ",")
for (i in 1:nrow(A)) {
for (j in 1:ncol(A)) {
if (A[i,j]!="") {
A[i, j] <- system.file("extdata", A[i, j], package = "correspondenceTables")
}}}
write.table(x = A,
file = file.path(tmp_dir,"example.csv"),
row.names = FALSE,
col.names = FALSE,
sep = ",")
NCT<-newCorrespondenceTable(file.path(tmp_dir,"example.csv"),
file.path(tmp_dir,"newCorrespondenceTable.csv"),
"A",
0.56)
summary(NCT)
head(NCT$newCorrespondenceTable)
NCT$classificationNames
csv_files<-list.files(tmp_dir, pattern = ".csv")
unlink(csv_files)
}