bind_clinspacy_embeddings {clinspacy}R Documentation

This function binds columns containing entity or concept embeddings to a data frame. The entity embeddings are derived from the scispacy package, and the concept embeddings are derived from the dataset_cui2vec_embeddings dataset included with this package.


The embeddings are derived from Andrew Beam's cui2vec R package.


  type = "scispacy",
  df_id = NULL,
  subset = "is_negated == FALSE"



A data.frame or file name containing the output from clinspacy. In order for scispacy embeddings to be available to bind_clinspacy_embeddings, you must set return_scispacy_embeddings to TRUE when running clinspacy so that the embeddings are included within clinspacy_output.


The data.frame to which you would like to bind the output of clinspacy.


The type of embeddings to return. One of scispacy and cui2vec. Whereas cui2vec embeddings require the UMLS linker to be enabled, the scispacy embeddings do not. Defaults to scispacy.


The name of the id column in the data frame with which the id column in clinspacy_output will be joined. If you supplied a df_id in clinspacy, then you must also supply it here. If you did not supply it in clinspacy, then it will default to the row number (similar behavior to in clinspacy).


Logical criteria represented as a string by which the clinspacy_output will be subsetted prior to building the output data frame. Defaults to "is_negated == FALSE", which removes negated concepts prior to generating the output. Any column in clinspacy_output may be referenced here. To avoid any subsetting, set this to NULL.



Beam, A.L., Kompa, B., Schmaltz, A., Fried, I., Griffin, W, Palmer, N.P., Shi, X., Cai, T., and Kohane, I.S.,, 2019. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data. arXiv preprint arXiv:1804.01486.


The cui2vec data is made available under a CC BY 4.0 license. The only change made to the original dataset is the renaming of columns.


A data frame containing the original data frame as well as the concept embeddings. For scispacy embeddings, this returns 200 columns of embeddings. For cui2vec embeddings, this returns 500 columns of embedings. The resulting data frame can be used to train a machine learning model.


## Not run: 
mtsamples <- dataset_mtsamples()
mtsamples[1:5,] %>%
  clinspacy(df_col = 'description', return_scispacy_embeddings = TRUE) %>%

## End(Not run)

[Package clinspacy version 1.0.2 Index]