install_pyspark {pysparklyr}R Documentation

Installs PySpark and Python dependencies

Description

Installs PySpark and Python dependencies

Installs Databricks Connect and Python dependencies

Usage

install_pyspark(
  version = NULL,
  envname = NULL,
  python_version = NULL,
  new_env = TRUE,
  method = c("auto", "virtualenv", "conda"),
  as_job = TRUE,
  install_ml = FALSE,
  ...
)

install_databricks(
  version = NULL,
  cluster_id = NULL,
  envname = NULL,
  python_version = NULL,
  new_env = TRUE,
  method = c("auto", "virtualenv", "conda"),
  as_job = TRUE,
  install_ml = FALSE,
  ...
)

Arguments

version

Version of 'databricks.connect' to install. Defaults to NULL. If NULL, it will check against PyPi to get the current library version.

envname

The name of the Python Environment to use to install the Python libraries. Defaults to NULL. If NULL, a name will automatically be assigned based on the version that will be installed

python_version

The minimum required version of Python to use to create the Python environment. Defaults to NULL. If NULL, it will check against PyPi to get the minimum required Python version.

new_env

If TRUE, any existing Python virtual environment and/or Conda environment specified by envname is deleted first.

method

The installation method to use. If creating a new environment, "auto" (the default) is equivalent to "virtualenv". Otherwise "auto" infers the installation method based on the type of Python environment specified by envname.

as_job

Runs the installation if using this function within the RStudio IDE.

install_ml

Installs ML related Python libraries. Defaults to TRUE. This is mainly for machines with limited storage to avoid installing the rather large 'torch' library if the ML features are not going to be used. This will apply to any environment backed by 'Spark' version 3.5 or above.

...

Passed on to reticulate::py_install()

cluster_id

Target of the cluster ID that will be used with. If provided, this value will be used to extract the cluster's version

Value

It returns no value to the R session. This function purpose is to create the 'Python' environment, and install the appropriate set of 'Python' libraries inside the new environment. During runtime, this function will send messages to the console describing the steps that the function is taking. For example, it will let the user know if it is getting the latest version of the Python library from 'PyPi.org', and the result of such query.


[Package pysparklyr version 0.1.4 Index]