R: Converting between XML and dataframes

fxml_toDataFrame {flatxml}

R Documentation

Converting between XML and dataframes

Description

Converts an XML document to a dataframe.

Usage

fxml_toDataFrame(
  xmlflat.df,
  siblings.of,
  same.tag = TRUE,
  attr.only = NULL,
  attr.not = NULL,
  elem.or.attr = "elem",
  col.attr = "",
  include.fields = NULL,
  exclude.fields = NULL
)

Arguments

`xmlflat.df`	A flat XML dataframe created with `fxml_importXMLFlat`.
`siblings.of`	ID of one of the XML elements that contain the data records. All data records need to be on the same hierarchical level as the XML element with this ID.
`same.tag`	If `TRUE`, only elements of the same type (`xmlflat.df$elem.`) as the element `sibling.of` are considered as data records. If `FALSE`, all elements on the same hierarchical level as the element `sibling.of` are considered to be data records.
`attr.only`	A list of named vectors representing attribute/value combinations the data records must match. The name of an element in the list is the XML element name to which the attribute belongs. The list element itself is a named vector. The vector's elements represent different attributes (= the names of the vector elements) and their values (= vector elements). Example: `attr.only = list(tag1 = c(attrib1 = "Value 1", attrib2 = "Value 2"), tag2 = c(attrib3 = "Value 3"))` will only include `tag1` elements of the form `<tag1 attrib1 = "Value 1" attrib2 = "Value 2">` and `tag2` elements of the form `<tag2 attrib3 = "Value 3">` as data records.
`attr.not`	A list of vectors representing attribute/value combinations the XML elements must not match to be considered as data records. See argument `attr.only` for details.
`elem.or.attr`	Either `"elem"` or `"attr"`. Defines, if the names of the record fields (columns in the dataframe) are represented by the names (tags) of the respective XML elements (the children of the elements on the same level as `siblings.of`) (`"elem"`) or if the field names are given by some attribute of those tags (`"attr"`).
`col.attr`	If `elem.or.attr` is `"attr"` then `col.attr` specifies the name of the attribute that gives the record field / column names.
`include.fields`	A character vector with the names of the fields that are to be included in the result dataframe. By default, all fields from the XML document are included.
`exclude.fields`	A character vector with the names of the fields that should be excluded in the result dataframe. By default, no fields from the XML document are excluded.

Details

Data that can be read in are either represented in this way:

<record>
<field1>Value of field1</field1>
<field2>Value of field2</field2>
<field3>Value of field3</field3>
</record>
...

In this case elem.or.attr would need to be "elem" because the field names of the data records (field1, field2, field3) are the names of the elements.

Or, the XML data could also look like this:

<record>
<column name="field1">Value of field1</column>
<column name="field2">Value of field2</column>
<column name="field3">Value of field3</column>
</record>
...

Here, the names of the fields are attributes, so elem.or.attr would need to be "attr" and col.attr would be set to "name", so fxml_toDataframe() knows where to look for the field/column names.

In any case, siblings.of would be the ID (xmlflat.df$elemid.) of one of the <record> elements.

Value

A dataframe with the data read in from the XML document.

Author(s)

Joachim Zuckarelli joachim@zuckarelli.de

Examples

# Load example file with population data from United Nations Statistics Division
# and create flat dataframe
example <- system.file("worldpopulation.xml", package="flatxml")
xml.dataframe <- fxml_importXMLFlat(example)

# Extract the data out of the XML document. The data records are on the same hierarchical level
# as element with ID 3 (xml.dataframe$elemid. ==  3).
# The field names are given in the "name" attribute of the children elements of element no. 3
# and its siblings
population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr",
col.attr="name")
# Exclude the "Value Footnote" field from the returned dataframe
population.df <- fxml_toDataFrame(xml.dataframe, siblings.of=3, elem.or.attr="attr",
col.attr="name", exclude.fields=c("Value Footnote"))


# Load example file with soccer world cup data (data from
# https://www.fifa.com/fifa-tournaments/statistics-and-records/worldcup/index.html)
# and create flat dataframe
example2 <- system.file("soccer.xml", package="flatxml")
xml.dataframe2 <- fxml_importXMLFlat(example2)

# Extract the data out of the XML document. The data records are on the same hierarchical level
# as element with ID 3 (xml.dataframe$elemid. ==  3). #' # The field names are given as the name
# of the children elements of element no. 3 and its siblings.
worldcups.df <- fxml_toDataFrame(xml.dataframe2, siblings.of=3, elem.or.attr="elem")

[Package flatxml version 0.1.1 Index]