Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Documentation for package ‘text’ version 1.2.3

Help Pages

centrality_data_harmony	Example data for plotting a Semantic Centrality Plot.
DP_projections_HILS_SWLS_100	Data for plotting a Dot Product Projection Plot.
Language_based_assessment_data_3_100	Example text and numeric data.
Language_based_assessment_data_8	Text and numeric data for 10 participants.
PC_projections_satisfactionwords_40	Example data for plotting a Principle Component Projection Plot.
raw_embeddings_1	Word embeddings from textEmbedRawLayers function
textCentrality	Compute semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.
textCentralityPlot	Plot words according to semantic similarity to the aggregated word embedding.
textClassify	Predict label and probability of a text using a pretrained classifier language model. (experimental)
textDescriptives	Compute descriptive statistics of character variables.
textDimName	Change the names of the dimensions in the word embeddings.
textDistance	Compute the semantic distance between two text variables.
textDistanceMatrix	Compute semantic distance scores between all combinations in a word embedding
textDistanceNorm	Compute the semantic distance between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct/concept).
textEmbed	Extract layers and aggregate them to word embeddings, for all character variables in a given dataframe.
textEmbedLayerAggregation	Select and aggregate layers of hidden states to form a word embedding.
textEmbedRawLayers	Extract layers of hidden states (word embeddings) for all character variables in a given dataframe.
textEmbedReduce	Pre-trained dimension reduction (experimental)
textEmbedStatic	Applies word embeddings from a given decontextualized static space (such as from Latent Semantic Analyses) to all character variables
textFineTuneDomain	Domain Adapted Pre-Training (EXPERIMENTAL - under development)
textFineTuneTask	Task Adapted Pre-Training (EXPERIMENTAL - under development)
textGeneration	Predicts the words that will follow a specified text prompt. (experimental)
textModelLayers	Get the number of layers in a given model.
textModels	Check downloaded, available models.
textModelsRemove	Delete a specified model and model associated files.
textNER	Named Entity Recognition. (experimental)
textPCA	Compute 2 PCA dimensions of the word embeddings for individual words.
textPCAPlot	Plot words according to 2-D plot from 2 PCA components.
textPlot	Plot words from textProjection() or textWordPrediction().
textPredict	Trained models created by e.g., textTrain() or stored on e.g., github can be used to predict new scores or classes from embeddings or text using textPredict.
textPredictAll	Predict from several models, selecting the correct input
textPredictTest	Significance testing correlations If only y1 is provided a t-test is computed, between the absolute error from yhat1-y1 and yhat2-y1.
textProjection	Compute Supervised Dimension Projection and related variables for plotting words.
textProjectionPlot	Plot words according to Supervised Dimension Projection.
textQA	Question Answering. (experimental)
textrpp_initialize	Initialize text required python packages
textrpp_install	Install text required python packages in conda or virtualenv environment
textrpp_install_virtualenv	Install text required python packages in conda or virtualenv environment
textrpp_uninstall	Uninstall textrpp conda environment
textSimilarity	Compute the semantic similarity between two text variables.
textSimilarityMatrix	Compute semantic similarity scores between all combinations in a word embedding
textSimilarityNorm	Compute the semantic similarity between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct).
textSum	Summarize texts. (experimental)
textTokenize	Tokenize according to different huggingface transformers
textTopics	This function creates and trains a BERTopic model (based on bertopic python packaged) on a text-variable in a tibble/data.frame. (EXPERIMENTAL)
textTopicsReduce	textTopicsReduce (EXPERIMENTAL)
textTopicsTest	This function tests the relationship between a single topic or all topics and a variable of interest. Available tests include correlation, t-test, linear regression, binary regression, and ridge regression. (EXPERIMENTAL - under development)
textTopicsTree	textTopicsTest (EXPERIMENTAL) to get the hierarchical topic tree
textTopicsWordcloud	This functions plots wordclouds of topics from a Topic Model based on their significance determined by a linear or binary regression
textTrain	Train word embeddings to a numeric (ridge regression) or categorical (random forest) variable.
textTrainLists	Individually trains word embeddings from several text variables to several numeric or categorical variables.
textTrainN	(experimental) Compute cross-validated correlations for different sample-sizes of a data set. The cross-validation process can be repeated several times to enhance the reliability of the evaluation.
textTrainNPlot	(experimental) Plot cross-validated correlation coefficients across different sample-sizes from the object returned by the textTrainN function. If the number of cross-validations exceed one, then error-bars will be included in the plot.
textTrainRandomForest	Train word embeddings to a categorical variable using random forest.
textTrainRegression	Train word embeddings to a numeric variable.
textTranslate	Translation. (experimental)
textWordPrediction	Compute predictions based on single words for plotting words. The word embeddings of single words are trained to predict the mean value associated with that word. P-values does NOT work yet (experimental).
textZeroShot	Zero Shot Classification (Experimental)
word_embeddings_4	Word embeddings for 4 text variables for 40 participants