| fxml_toDataFrame {flatxml} | R Documentation |
Converting between XML and dataframes
Description
Converts an XML document to a dataframe.
Usage
fxml_toDataFrame(
xmlflat.df,
siblings.of,
same.tag = TRUE,
attr.only = NULL,
attr.not = NULL,
elem.or.attr = "elem",
col.attr = "",
include.fields = NULL,
exclude.fields = NULL
)
Arguments
xmlflat.df |
A flat XML dataframe created with |
siblings.of |
ID of one of the XML elements that contain the data records. All data records need to be on the same hierarchical level as the XML element with this ID. |
same.tag |
If |
attr.only |
A list of named vectors representing attribute/value combinations the data records must match.
The name of an element in the list is the XML element name to which the attribute belongs. The list element itself is a named vector.
The vector's elements represent different attributes (= the names of the vector elements) and their values (= vector elements).
Example: |
attr.not |
A list of vectors representing attribute/value combinations the XML elements must not match to be considered as data records. See argument |
elem.or.attr |
Either |
col.attr |
If |
include.fields |
A character vector with the names of the fields that are to be included in the result dataframe. By default, all fields from the XML document are included. |
exclude.fields |
A character vector with the names of the fields that should be excluded in the result dataframe. By default, no fields from the XML document are excluded. |
Details
Data that can be read in are either represented in this way:
<record>
<field1>Value of field1</field1>
<field2>Value of field2</field2>
<field3>Value of field3</field3>
</record>
...
In this case elem.or.attr would need to be "elem" because the field names of the data records (field1, field2, field3) are the names of the elements.
Or, the XML data could also look like this:
<record>
<column name="field1">Value of field1</column>
<column name="field2">Value of field2</column>
<column name="field3">Value of field3</column>
</record>
...
Here, the names of the fields are attributes, so elem.or.attr would need to be "attr" and col.attr would be set to
"name", so fxml_toDataframe() knows where to look for the field/column names.
In any case, siblings.of would be the ID (xmlflat.df$elemid.) of one of the <record> elements.
Value
A dataframe with the data read in from the XML document.
Author(s)
Joachim Zuckarelli joachim@zuckarelli.de
See Also
fxml_importXMLFlat, fxml_toXML
Examples
# Load example file with population data from United Nations Statistics Division
# and create flat dataframe
example <- system.file("worldpopulation.xml", package="flatxml")
xml.dataframe <- fxml_importXMLFlat(example)
# Extract the data out of the XML document. The data records are on the same hierarchical level
# as element with ID 3 (xml.dataframe$elemid. == 3).
# The field names are given in the "name" attribute of the children elements of element no. 3
# and its siblings
population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr",
col.attr="name")
# Exclude the "Value Footnote" field from the returned dataframe
population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr",
col.attr="name", exclude.fields=c("Value Footnote"))
# Load example file with soccer world cup data (data from
# https://www.fifa.com/fifa-tournaments/statistics-and-records/worldcup/index.html)
# and create flat dataframe
example2 <- system.file("soccer.xml", package="flatxml")
xml.dataframe2 <- fxml_importXMLFlat(example2)
# Extract the data out of the XML document. The data records are on the same hierarchical level
# as element with ID 3 (xml.dataframe$elemid. == 3). #' # The field names are given as the name
# of the children elements of element no. 3 and its siblings.
worldcups.df <- fxml_toDataFrame(xml.dataframe2, siblings.of=3, elem.or.attr="elem")