getSentiment {edgar}R Documentation

Provides sentiment measures of EDGAR filings


getSentiment computes sentiment measures of EDGAR filings


getSentiment(, form.type, filing.year, useragent)


vector of CIK number of firms in integer format. Suppress leading zeroes from CIKs. Keep = 'ALL' if needs to download for all CIKs.


character vector containing form type to be downloaded. form.type = 'ALL' if need to download all forms.


vector of four digit numeric year


Should be in the form of "Your Name"


getSentiment function takes CIK(s), form type(s), and year(s) as input parameters. The function first imports available downloaded filings in the local working directory 'Edgar filings_full text' created by getFilings function; otherwise, it automatically downloads the filings which are not already been downloaded. It then reads, cleans, and computes sentiment measures for these filings. The function returns a dataframe with filing information and sentiment measures. According to SEC EDGAR's guidelines a user also needs to declare user agent.


Function returns dataframe containing CIK number, company name, date of filing, accession number, and various sentiment measures. This function takes the help of Loughran-McDonald (L&M) sentiment dictionaries ( to compute sentiment measures of a EDGAR filing. Following are the definitions of the text characteristics and the sentiment measures:

file.size = The filing size of a complete filing on the EDGAR server in kilobyte (KB).

word.count = The total number of words in a filing text, excluding HTML tags and exhibits text.

unique.word.count = The total number of unique words in a filing text, excluding HTML tags and exhibits text.

stopword.count = The total number of stop words in a filing text, excluding exhibits text.

char.count = The total number of characters in a filing text, excluding HTML tags and exhibits text.

complex.word.count = The total number of complex words in the filing text. When vowels (a, e, i, o, u) occur more than three times in a word, then that word is identified as a complex word.

lm.dictionary.count = The number of words in the filing text that occur in the Loughran-McDonald (LM) master dictionary.

lm.negative.count = The number of LM financial-negative words in the document.

lm.positive.count = The number of LM financial-positive words in the document.

lm.strong.modal.count = The number of LM financial-strong modal words in the document.

lm.moderate.modal.count = The number of LM financial-moderate Modal words in the document.

lm.weak.modal.count = The number of LM financial-weak modal words in the document.

lm.uncertainty.count = The number of LM financial-uncertainty words in the document.

lm.litigious.count = The number of LM financial-litigious words in the document.

hv.negative.count = The number of words in the document that occur in the 'Harvard General Inquirer' Negative word list, as defined by LM.


## Not run: 

senti.df <- getSentiment( = c('1000180', '38079'), 
                         form.type = '10-K', filing.year = 2006, useragent) 
## Returns dataframe with sentiment measures of firms with CIKs 
1000180 and 38079 filed in year 2006 for form type '10-K'.

senti.df <- getSentiment( = '38079', form.type = c('10-K', '10-Q'), 
                         filing.year = c(2005, 2006), useragent)

## End(Not run)

[Package edgar version 2.0.5 Index]