html_getTables {MazamaCoreUtils} | R Documentation |
Find all tables in an html page
Description
Parses an html page to extract all <table>
elements and
return them in a list of dataframes representing each table. The columns and
rows of these dataframes are that of the table it represents. A single table
can be extracted as a dataframe by passing the index of the table in addition
to the url to html_getTable()
.
Usage
html_getTables(url = NULL, header = NA)
html_getTable(url = NULL, header = NA, index = 1)
Arguments
url |
URL or file path of an html page. |
header |
Use first row as header? If NA, will use first row if it consists of <th> tags. |
index |
Index identifying which table to to return. |
Value
A list of dataframes representing each table on a html page.
Examples
library(MazamaCoreUtils)
# Fail gracefully if the resource is not available
try({
# Wikipedia's list of timezones
url <- "http://en.wikipedia.org/wiki/List_of_tz_database_time_zones"
# Extract tables
tables <- html_getTables(url)
# Extract the first table
# NOTE: Analogous to firstTable <- html_getTable(url, index = 1)
firstTable <- tables[[1]]
head(firstTable)
nrow(firstTable)
}, silent = FALSE)
[Package MazamaCoreUtils version 0.5.2 Index]