R: Remove html elements

fhir_rm_div {fhircrackr}

R Documentation

Remove html elements

Description

This function is a convenience wrapper for fhir_rm_tag() that removes all ⁠<div> </div>⁠ elements from an xml. div tags in FHIR resources contain html code, which is often server generated and in most cases neither relevant nor usable for data analysis.

Usage

fhir_rm_div(x)

Arguments

`x`	A fhir_bundle_xml or fhir_bundle_list object or a character vector containing xml objects.

Value

An object of the same class as x where all tags matching the tag argument are removed.

Examples


#Example 1: Remove div tags from xmls in a character vector
string <- c("Hallo<div>Please<p>Remove Me</p></div> World!",
            "A<div><div><p>B</p></div>C</div>D")

fhir_rm_div(x = string)




#Example 2: Remove div tags in a single fhir bundle
bundle <- fhir_unserialize(patient_bundles)[[1]]

#example bundle contains html parts in div tags:
cat(toString(bundle))

#remove html parts
bundle_cleaned <- fhir_rm_div(x = bundle)

#have a look at the result
cat(toString(bundle_cleaned))





#Example 3: Remove div tags in a list of fhir bundles
bundle_list <- fhir_unserialize(patient_bundles)


#remove html parts
bundle_list_cleaned <- fhir_rm_div(x = bundle_list)

#check out how much the size of the bundle list is reduced by removing html
size_with_html <- sum(sapply(bundle_list, function(x)object.size(toString(x))))
size_without_html <- sum(sapply(bundle_list_cleaned, function(x)object.size(toString(x))))

size_without_html/size_with_html