clinspacy {clinspacy}R Documentation

This is the primary function for processing both data frames and character vectors in the clinspacy package.

Description

This is the primary function for processing both data frames and character vectors in the clinspacy package.

Usage

clinspacy(
  x,
  df_col = NULL,
  df_id = NULL,
  threshold = 0.99,
  semantic_types = c(NA, "Acquired Abnormality", "Activity", "Age Group",
    "Amino Acid Sequence", "Amino Acid, Peptide, or Protein", "Amphibian",
    "Anatomical Abnormality", "Anatomical Structure", "Animal", "Antibiotic", "Archaeon",
    "Bacterium", "Behavior", "Biologic Function", "Biologically Active Substance",
    "Biomedical Occupation or Discipline", "Biomedical or Dental Material", "Bird",
    "Body Location or Region", "Body Part, Organ, or Organ Component",
    "Body Space or Junction", "Body Substance", "Body System", "Carbohydrate Sequence",  
       "Cell", "Cell Component", "Cell Function", "Cell or Molecular Dysfunction",
    "Chemical", "Chemical Viewed Functionally", "Chemical Viewed Structurally",
    "Classification", "Clinical Attribute", "Clinical Drug", "Conceptual Entity",
    "Congenital Abnormality", "Daily or Recreational Activity", "Diagnostic Procedure",
    "Disease or Syndrome", "Drug Delivery Device", "Educational Activity",
    "Element, Ion, or Isotope", "Embryonic Structure", "Entity",
    "Environmental Effect of Humans", "Enzyme", "Eukaryote",      "Event",
    "Experimental Model of Disease", "Family Group", "Finding", "Fish", "Food",
    "Fully Formed Anatomical Structure", "Functional Concept", "Fungus",
    "Gene or Genome", "Genetic Function", "Geographic Area",
    "Governmental or Regulatory Activity", "Group", "Group Attribute",
    "Hazardous or Poisonous Substance", "Health Care Activity",
    "Health Care Related Organization", "Hormone", "Human",
    "Human-caused Phenomenon or Process", "Idea or Concept", "Immunologic Factor",
    "Indicator, Reagent, or Diagnostic Aid",      "Individual Behavior",
    "Injury or Poisoning", "Inorganic Chemical", "Intellectual Product",
    "Laboratory or Test Result", "Laboratory Procedure", "Language", "Machine Activity",
    "Mammal", "Manufactured Object", "Medical Device",
    "Mental or Behavioral Dysfunction", "Mental Process",
    "Molecular Biology Research Technique", "Molecular Function", "Molecular Sequence",
    "Natural Phenomenon or Process", "Neoplastic Process",
    "Nucleic Acid, Nucleoside, or Nucleotide", "Nucleotide Sequence",
    "Occupation or Discipline",      "Occupational Activity", "Organ or Tissue Function",
    "Organic Chemical", "Organism", "Organism Attribute", "Organism Function",
    "Organization", "Pathologic Function", "Patient or Disabled Group",
    "Pharmacologic Substance", "Phenomenon or Process", "Physical Object",
    "Physiologic Function", "Plant", "Population Group",
    "Professional or Occupational Group", "Professional Society", "Qualitative Concept",
    "Quantitative Concept", "Receptor", "Regulation or Law", "Reptile",
    "Research Activity", "Research Device",      "Self-help or Relief Organization",
    "Sign or Symptom", "Social Behavior", "Spatial Concept", "Substance",
    "Temporal Concept", "Therapeutic or Preventive Procedure", "Tissue", "Vertebrate",
    "Virus", "Vitamin"),
  return_scispacy_embeddings = FALSE,
  verbose = TRUE,
  output_file = NULL,
  overwrite = FALSE
)

Arguments

x

Either a data.frame or a character vector

df_col

If x is a data.frame then you must specify the name of the column containing text as a string.

df_id

If x is a data.frame then you may *optionally* specify an id column to help match up each row of text in the original data frame with the resulting output. If you do not specify an id, the resulting will contain the row number from the original data.frame.

threshold

Defaults to 0.99. The confidence threshold value used by clinspacy (can be higher than the linker_threshold from clinspacy_init). Note that whereas the linker_threshold can only be set once per session, this threshold can be updated during the R session.

semantic_types

Character vector containing any combination of the following: c(NA, "Acquired Abnormality", "Activity", "Age Group", "Amino Acid Sequence", "Amino Acid, Peptide, or Protein", "Amphibian", "Anatomical Abnormality", "Anatomical Structure", "Animal", "Antibiotic", "Archaeon", "Bacterium", "Behavior", "Biologic Function", "Biologically Active Substance", "Biomedical Occupation or Discipline", "Biomedical or Dental Material", "Bird", "Body Location or Region", "Body Part, Organ, or Organ Component", "Body Space or Junction", "Body Substance", "Body System", "Carbohydrate Sequence", "Cell", "Cell Component", "Cell Function", "Cell or Molecular Dysfunction", "Chemical", "Chemical Viewed Functionally", "Chemical Viewed Structurally", "Classification", "Clinical Attribute", "Clinical Drug", "Conceptual Entity", "Congenital Abnormality", "Daily or Recreational Activity", "Diagnostic Procedure", "Disease or Syndrome", "Drug Delivery Device", "Educational Activity", "Element, Ion, or Isotope", "Embryonic Structure", "Entity", "Environmental Effect of Humans", "Enzyme", "Eukaryote", "Event", "Experimental Model of Disease", "Family Group", "Finding", "Fish", "Food", "Fully Formed Anatomical Structure", "Functional Concept", "Fungus", "Gene or Genome", "Genetic Function", "Geographic Area", "Governmental or Regulatory Activity", "Group", "Group Attribute", "Hazardous or Poisonous Substance", "Health Care Activity", "Health Care Related Organization", "Hormone", "Human", "Human-caused Phenomenon or Process", "Idea or Concept", "Immunologic Factor", "Indicator, Reagent, or Diagnostic Aid", "Individual Behavior", "Injury or Poisoning", "Inorganic Chemical", "Intellectual Product", "Laboratory or Test Result", "Laboratory Procedure", "Language", "Machine Activity", "Mammal", "Manufactured Object", "Medical Device", "Mental or Behavioral Dysfunction", "Mental Process", "Molecular Biology Research Technique", "Molecular Function", "Molecular Sequence", "Natural Phenomenon or Process", "Neoplastic Process", "Nucleic Acid, Nucleoside, or Nucleotide", "Nucleotide Sequence", "Occupation or Discipline", "Occupational Activity", "Organ or Tissue Function", "Organic Chemical", "Organism", "Organism Attribute", "Organism Function", "Organization", "Pathologic Function", "Patient or Disabled Group", "Pharmacologic Substance", "Phenomenon or Process", "Physical Object", "Physiologic Function", "Plant", "Population Group", "Professional or Occupational Group", "Professional Society", "Qualitative Concept", "Quantitative Concept", "Receptor", "Regulation or Law", "Reptile", "Research Activity", "Research Device", "Self-help or Relief Organization", "Sign or Symptom", "Social Behavior", "Spatial Concept", "Substance", "Temporal Concept", "Therapeutic or Preventive Procedure", "Tissue", "Vertebrate", "Virus", "Vitamin")

return_scispacy_embeddings

Defaults to FALSE. This is primarily intended for use by the bind_clinspacy_embeddings function to obtain scispacy embeddings. In order for scispacy embeddings to be available to bind_clinspacy_embeddings, you must set this to TRUE.

verbose

Defaults to TRUE.

output_file

Defaults to NULL. This is an optional argument that writes the output to a comma-separated value (CSV) file.

overwrite

Defaults to FALSE. If output_file already exists and overwrite is set to FALSE, then you will be prompted to confirm whether you would like to overwrite the file. If set to TRUE, then output_file will automatically be overwritten.

Value

If output_file is NULL (the default), then this function returns a data frame containing the UMLS concept unique identifiers (cui), entities, lemmatized entities, CyContext negation status (TRUE means negated, FALSE means *not* negated), other CyContext contexts, and section title from the clinical sectionizer. If output_file points to a file name, then the name of the created file will be returned.

Examples

## Not run: 
clinspacy('This patient has diabetes and CKD stage 3 but no HTN.')

clinspacy(c('This pt has CKD and HTN', 'Pt only has CKD but no HTN'))

data.frame(text = c('This pt has CKD and HTN', 'Diabetes is present'),
           stringsAsFactors = FALSE) %>%
  clinspacy(df_col = 'text')

if (!dir.exists(rappdirs::user_data_dir('clinspacy'))) {
  dir.create(rappdirs::user_data_dir('clinspacy'), recursive = TRUE)
  }

clinspacy(c('This pt has CKD and HTN', 'Has CKD but no HTN'),
  output_file = file.path(rappdirs::user_data_dir('clinspacy'),
                          'output.csv'),
  overwrite = TRUE)

## End(Not run)


[Package clinspacy version 1.0.2 Index]