zoologThesaurus {zoolog} | R Documentation |
Thesaurus Set for zoolog
Description
The thesaurus set defined for the package zoolog.
This is used to make the methods robust to different nomenclatures used
in datasets created by different authors. The user can also use other
thesaurus sets, or can modify the provided thesaurus set (see
ThesaurusManagement
and ThesaurusReaderWriter
).
Usage
zoologThesaurus
Format
A thesaurus set is a list of thesauri with additional attributes:
- names
Character vector with the name of each thesaurus.
- applyToColNames
Logical vector indicating whether each thesaurus should be applied to the column names of the data frame.
- applyToColValues
Logical vector indicating whether each thesaurus should be applied to the values in the corresponding column of the data frame.
- filename
Character vector with the source file of each thesaurus.
The examples below show the list of four thesauri included in the provided
zoologThesurus
.
Each thesaurus is a data frame also with additional attributes. Each column
of the data frame is a category of names with equivalent meaning in the
intended application. The column name identifies the category and is used
as the standard when applying StandardizeNomenclature
.
The names in each column (category) must not be included in any other
column, since this would make the thesaurus ambiguous (see
ThesaurusAmbiguity
).
Each thesaurus has the following attributes:
- names
The standard name for the categories.
- class
"data.frame"
- row.names
Irrelevant
- caseSensitive
Logical indicating whether the names in the thesaurus should be considered case-sensitive.
- accentSensitive
Logical indicating whether the names in the thesaurus should be differentiated by the presence of accent marks.
- punctuationSensitive
Logical indicating whether the names in the thesaurus should be differentiated by the presence of punctuation marks.
The examples below show the content and characteristics of the first
thesaurus in zoologThesaurus
.
File Structure
zoologThesaurus
is an exported variable automatically loaded in
memory. In addition, the source files generating it are included in the
zoolog extdata
folder. There is one file for the thesaurus set
main structure and one file for each included thesaurus. All of them are in
semicolon separated format. Thus, they can be examined in any text editor
or imported into any spreadsheet application. The files are:
zoologThesaurusSet.csv
Defines the main structure of the thesaurus set. It has a row for each thesaurus and seven columns (ThesaurusName, FileName, CaseSensitive, AccentSensitive, PunctuationSensitive, ApplyToColNames, and ApplyToColValues). Their meaning coincides with the description above. Observe that the case, accent, and punctuation sensitiveness is stored here, instead of in each thesaurus.
identifierThesaurus.csv
Thesaurus for the identifiers used in
LogRatios
to identify the bone types and the measure names in the data and the references. It has for columns: Taxon, Element, Measure, and Standard.taxonThesaurus.csv
Thesaurus for the taxa. There is one column for each category of taxon considered.
elementThesaurus.csv
Thesaurus for the skeletal elements. One column for each category.
measureThesaurus.csv
Thesaurus for the measure names. One column for each category.
Examples
## List of thesaurus names and characteristics in the thesaurus set:
attributes(zoologThesaurus)
## Content of the first thesaurus:
zoologThesaurus$identifier
attributes(zoologThesaurus$identifier)