read.transcript {qdap} | R Documentation |
Read Transcripts Into R
Description
Read .docx, .csv or .xlsx files into R.
Usage
read.transcript(
file,
col.names = NULL,
text.var = NULL,
merge.broke.tot = TRUE,
header = FALSE,
dash = "",
ellipsis = "...",
quote2bracket = FALSE,
rm.empty.rows = TRUE,
na.strings = c("999", "NA", "", " "),
sep = NULL,
skip = 0,
nontext2factor = TRUE,
text,
comment.char = "",
...
)
Arguments
file |
The name of the file which the data are to be read from. Each row
of the table appears as one line of the file. If it does not contain an
absolute path, the file name is relative to the current working directory,
|
col.names |
A character vector specifying the column names of the transcript columns. |
text.var |
A character string specifying the name of the text variable
will ensure that variable is classed as character. If |
merge.broke.tot |
logical. If |
header |
logical. If |
dash |
A character string to replace the en and em dashes special characters (default is to remove). |
ellipsis |
A character string to replace the ellipsis special characters (default is text ...). |
quote2bracket |
logical. If |
rm.empty.rows |
logical. If |
na.strings |
A vector of character strings which are to be interpreted
as |
sep |
The field separator character. Values on each line of the file are
separated by this character. The default of |
skip |
Integer; the number of lines of the data file to skip before beginning to read data. |
nontext2factor |
logical. If |
text |
Character string: if file is not supplied and this is, then data are read from the value of text. Notice that a literal string can be used to include (small) data sets within R code. |
comment.char |
A character vector of length one containing a single
character or an empty string. Use |
... |
Further arguments to be passed to |
Value
Returns a dataframe of dialogue and people.
Warning
read.transcript
may contain errors if the
file being read in is .docx. The researcher should carefully investigate
each transcript for errors before further parsing the data.
Note
If a transcript is a .docx file read transcript expects two columns (generally person and dialogue) with some sort of separator (default is colon separator). .doc files must be converted to .docx before reading in.
Author(s)
Bryan Goodrich and Tyler Rinker <tyler.rinker@gmail.com>.
References
https://github.com/trinker/qdap/wiki/Reading-.docx-%5BMS-Word%5D-Transcripts-into-R
See Also
Examples
## Not run:
#Note: to view the document below use the path:
system.file("extdata/transcripts/", package = "qdap")
(doc1 <- system.file("extdata/transcripts/trans1.docx", package = "qdap"))
(doc2 <- system.file("extdata/transcripts/trans2.docx", package = "qdap"))
(doc3 <- system.file("extdata/transcripts/trans3.docx", package = "qdap"))
(doc4 <- system.file("extdata/transcripts/trans4.xlsx", package = "qdap"))
dat1 <- read.transcript(doc1)
truncdf(dat1, 40)
dat2 <- read.transcript(doc1, col.names = c("person", "dialogue"))
truncdf(dat2, 40)
dat2b <- rm_row(dat2, "person", "[C") #remove bracket row
truncdf(dat2b, 40)
## read.transcript(doc2) #throws an error (need skip)
dat3 <- read.transcript(doc2, skip = 1); truncdf(dat3, 40)
## read.transcript(doc3, skip = 1) #incorrect read; wrong sep
dat4 <- read.transcript(doc3, sep = "-", skip = 1); truncdf(dat4, 40)
dat5 <- read.transcript(doc4); truncdf(dat5, 40) #an .xlsx file
trans <- "sam: Computer is fun. Not too fun.
greg: No it's not, it's dumb.
teacher: What should we do?
sam: You liar, it stinks!"
read.transcript(text=trans)
## Read in text specify spaces as sep
## EXAMPLE 1
read.transcript(text="34 The New York Times reports a lot of words here.
12 Greenwire reports a lot of words.
31 Only three words.
2 The Financial Times reports a lot of words.
9 Greenwire short.
13 The New York Times reports a lot of words again.",
col.names=qcv(NO, ARTICLE), sep=" ")
## EXAMPLE 2
read.transcript(text="34.. The New York Times reports a lot of words here.
12.. Greenwire reports a lot of words.
31.. Only three words.
2.. The Financial Times reports a lot of words.
9.. Greenwire short.
13.. The New York Times reports a lot of words again.",
col.names=qcv(NO, ARTICLE), sep="\\.\\.")
## End(Not run)