| parse_text_filing {edgarWebR} | R Documentation | 
Parse Text Filing
Description
Given a link to a filing document (e.g. the 10-K, 8-K) in TXT, process the file into parts and items. This enables follow-up processing of a desired section - e.g. just the Risk Factors. 'item.name' and 'part.name' are taken directly from the document without any attempt to normalize.
Usage
parse_text_filing(x, strip = TRUE, include.raw = FALSE, fix.errors = TRUE)
Arguments
x | 
 - URL to a filing text document or actual text  | 
strip | 
 - Should non-text elements be removed? Default: true  | 
include.raw | 
 - Include unprocessed nodes in result? Default: false  | 
fix.errors | 
 - Try to fix document errors (e.g. missing part labels). WIP. Default: true  | 
Details
NOTE: This has been tested on a range of documents, but formatting differences could cause failures. Please report an issue for any document that isn't parsed correctly.
FURTHER NOTE: Not all filings are well formed - missing headings, bad spacing, etc. These can all throw the parsing off!
Value
a dataframe with one row per paragraph
- part.name
 Detected name of the Part
- item.name
 Detected name of the Item
- text
 Text of the paragraph / node
- raw*
 Raw HTML of the node if
include.raw = TRUE
Examples
try(head(parse_text_filing(
  "https://www.sec.gov/Archives/edgar/data/37996/000003799602000015/v7.txt"
)))