fxml_toDataFrame {flatxml} | R Documentation |
Converting between XML and dataframes
Description
Converts an XML document to a dataframe.
Usage
fxml_toDataFrame(
xmlflat.df,
siblings.of,
same.tag = TRUE,
attr.only = NULL,
attr.not = NULL,
elem.or.attr = "elem",
col.attr = "",
include.fields = NULL,
exclude.fields = NULL
)
Arguments
xmlflat.df |
A flat XML dataframe created with |
siblings.of |
ID of one of the XML elements that contain the data records. All data records need to be on the same hierarchical level as the XML element with this ID. |
same.tag |
If |
attr.only |
A list of named vectors representing attribute/value combinations the data records must match.
The name of an element in the list is the XML element name to which the attribute belongs. The list element itself is a named vector.
The vector's elements represent different attributes (= the names of the vector elements) and their values (= vector elements).
Example: |
attr.not |
A list of vectors representing attribute/value combinations the XML elements must not match to be considered as data records. See argument |
elem.or.attr |
Either |
col.attr |
If |
include.fields |
A character vector with the names of the fields that are to be included in the result dataframe. By default, all fields from the XML document are included. |
exclude.fields |
A character vector with the names of the fields that should be excluded in the result dataframe. By default, no fields from the XML document are excluded. |
Details
Data that can be read in are either represented in this way:
<record>
<field1>Value of field1</field1>
<field2>Value of field2</field2>
<field3>Value of field3</field3>
</record>
...
In this case elem.or.attr
would need to be "elem"
because the field names of the data records (field1
, field2
, field3
) are the names of the elements.
Or, the XML data could also look like this:
<record>
<column name="field1">Value of field1</column>
<column name="field2">Value of field2</column>
<column name="field3">Value of field3</column>
</record>
...
Here, the names of the fields are attributes, so elem.or.attr
would need to be "attr"
and col.attr
would be set to
"name"
, so fxml_toDataframe()
knows where to look for the field/column names.
In any case, siblings.of
would be the ID (xmlflat.df$elemid.
) of one of the <record>
elements.
Value
A dataframe with the data read in from the XML document.
Author(s)
Joachim Zuckarelli joachim@zuckarelli.de
See Also
fxml_importXMLFlat
, fxml_toXML
Examples
# Load example file with population data from United Nations Statistics Division
# and create flat dataframe
example <- system.file("worldpopulation.xml", package="flatxml")
xml.dataframe <- fxml_importXMLFlat(example)
# Extract the data out of the XML document. The data records are on the same hierarchical level
# as element with ID 3 (xml.dataframe$elemid. == 3).
# The field names are given in the "name" attribute of the children elements of element no. 3
# and its siblings
population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr",
col.attr="name")
# Exclude the "Value Footnote" field from the returned dataframe
population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr",
col.attr="name", exclude.fields=c("Value Footnote"))
# Load example file with soccer world cup data (data from
# https://www.fifa.com/fifa-tournaments/statistics-and-records/worldcup/index.html)
# and create flat dataframe
example2 <- system.file("soccer.xml", package="flatxml")
xml.dataframe2 <- fxml_importXMLFlat(example2)
# Extract the data out of the XML document. The data records are on the same hierarchical level
# as element with ID 3 (xml.dataframe$elemid. == 3). #' # The field names are given as the name
# of the children elements of element no. 3 and its siblings.
worldcups.df <- fxml_toDataFrame(xml.dataframe2, siblings.of=3, elem.or.attr="elem")