create_img_db {parseRPDR}R Documentation

Create a database of DICOM headers.

Description

The function creates a database of DICOM headers present in a folder structure. Each series should be in its own folder, but they can be in a nested folder structure. Files where there are also folder present next to them at the same level will not be parsed. That is the folder structure needs to comply with the DICOM standard. Be aware that the function requires python and pydicom to be installed! The function cycles through all folders present in the provided path and recursively goes through them, every subfolder, and extracts the DICOM header information from the files using the dcmread function of the pydicom package. The extension of the files can be provided by the ext argument, as DICOM files may have different extensions then that of .dcm. Also, using the all boolean argument, you can specify whether the function provides output for each file, or only for the first file, which is beneficial if you are analyzing multi-slice series, as all instances have almost all the same header information. Furthermore, using the keywords argument you can manually specify which DICOM keywords you wish to extract. These need to be a valid keyword specified in the DICOM standard.

Usage

create_img_db(
  path,
  ext = c(".dcm", ".dicom", ".ima", ".tmp", ""),
  all = TRUE,
  keywords = c("StudyDate", "StudyTime", "SeriesDate", "SeriesTime", "AcquisitionDate",
    "AcquisitionTime", "ConversionType", "Manufacturer", "InstitutionName",
    "InstitutionalDepartmentName", "ReferringPhysicianName", "Modality",
    "ManufacturerModelName", "StudyDescription", "SeriesDescription", "StudyComments",
    "ProtocolName", "RequestedProcedureID", "ViewPosition", "StudyInstanceUID",
    "SeriesInstanceUID", "SOPInstanceUID", "AccessionNumber", "PatientName", "PatientID",
    "IssuerOfPatientID", "PatientBirthDate", 
     "PatientSex", "PatientAge",
    "PatientSize", "PatientWeight", "StudyID", "SeriesNumber", "AcquisitionNumber",
    "InstanceNumber", "BodyPartExamined", "SliceThickness", "SpacingBetweenSlices",
    "PixelSpacing", "PixelAspectRatio", "Rows", "Columns", "FieldOfViewDimensions",
    "RescaleIntercept", "RescaleSlope", "WindowCenter", "WindowWidth", "BitsAllocated",
    "BitsStored", "PhotometricInterpretation", "KVP", "ExposureTime", "XRayTubeCurrent",
    "ExposureInuAs", "ImageAndFluoroscopyAreaDoseProduct", "FilterType", 
    
    "ConvolutionKernel", "CTDIvol", "ReconstructionFieldOfView"),
  nThread = parallel::detectCores() - 1,
  na = TRUE,
  identical = TRUE
)

Arguments

path

string vector, full folder path to folder that contains the images.

ext

string array, possible file extensions to parse. It is advised to add . before the extensions as the given character patterns may be present elsewhere in the file names. Furthermore, if DICOM files without an extension should also be parsed, then add "" to the extensions as then the script will try to read all files without an extension. Also, the file names and the extensions are converted to lower case before matching to avoid mismatches due to capitals.

all

boolean, whether all files in a series should be parsed, or only the first one.

keywords

string array, of valid DICOM keywords.

nThread

integer, number of threads to use for parsing data.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

Value

data.table, with DICOM header information return unchanged. However, the function also provides additional new columns which help further data manipulations, these are:

time_study

POSIXct, StudyDate and StudyTime concatentated together to POSIXct.

time_series

POSIXct, SeriesDate and SeriesTime concatentated together to POSIXct.

time_acquisition

POSIXct, AcquisitionDate and AcquisitionTime concatentated together to POSIXct.

name_img

string, PatientName with special characters removed.

time_date_of_birth_img

POSIXct, PatientBirthDate as POSIXct.

img_pixel_spacing

numeric, PixelSpacing value of the first element in the array returned as numerical value.

Examples

## Not run: 
#Create a database with DICOM header information
all_dicom_headers <- create_img_db(path = "/Users/Test/Data/DICOM/")
all_dicom_headers <- create_img_db(path = "/Users/Test/Data/DICOM/", ext = c(".dcm", ".DICOM"))
#Create a database with DICOM header information for only IDs and accession numbers
all_dicom_headers <- create_img_db(path = "/Users/Test/Data/DICOM/",
keywords = c("PatientID", "AccessionNumber"))

## End(Not run)

[Package parseRPDR version 1.1.1 Index]