Databricks Connector 1.0.0.0

The Databricks component allows you to list, import, and automate Databricks jobs.

Prerequisites

Version 9.2.9 or later
Connection component 1.0.0.3 or later. Note that the Connections component will be installed or updated automatically if necessary when you install this extension.
Privileges Required to Use Connections
Privileges Required to Use Databricks

Contents of the Component

Object Type	Name	Description
Application	GLOBAL.Redwood.REDWOOD.Databricks	Integration connector with the Databricks system
ConstraintDefinition	REDWOOD.Redwood_DatabricksConnectionConstraint	Constraint for Databricks Connection fields
ExtensionPoint	REDWOOD.Redwood_DatabricksConnection	Databricks Connector
Process Definition	REDWOOD.Redwood_Databricks_ImportJob	Import a job from Databricks
Process Definition	REDWOOD.Redwood_Databricks_RunJob	Run a job in Databricks
Process Definition	REDWOOD.Redwood_Databricks_RunJob_Template	Template definition to run a job in Databricks
Process Definition	REDWOOD.Redwood_Databricks_ShowJobs	List all existing jobs in Databricks
Process Definition Type	REDWOOD.Redwood_Databricks	Databricks Connector
Library	REDWOOD.Redwood_Databricks	Library for Databricks connector

Process Definitions

Redwood_Databricks_ImportJob

Import a job from Databricks. Imports one or more Databricks jobs as RunMyJobs Process Definitions. Specify a Name Filter to control what processes are imported, and Generation Settings to control the attributes of the imported definitions.

Parameters

Tab	Name	Description	Documentation	Data Type	Direction	Default Expression	Values
Parameters	`connection`	Connection	The Connection object that defines the connection to the Databricks application.	String	In
Parameters	`filter`	Job Name Filter	This filter can be used to limit the amount of jobs returned to those which name matches the filter. Wildcards `*` and `?` are allowed.	String	In
Parameters	`overwrite`	Overwrite Existing Definition	When set to Yes, if a definition already exists with the same name as the name generated for the imported object, it will be overwritten with the new import. When set to No, the import for that template will be skipped if a definition with the same name already exists.	String	In	`N`	`Y`,`N`
Generation Settings	`targetPartition`	Partition	The Partition to create the new definitions in.	String	In
Generation Settings	`targetApplication`	Application	The Application to create the new definitions in.	String	In
Generation Settings	`targetQueue`	Default Queue	The default Queue to assign to the generated definitions.	String	In
Generation Settings	`targetPrefix`	Definition Name Prefix	The prefix to add onto the name of the imported Databricks Job to create the definition name.	String	In	`CUS_DBCKS_`

Redwood_Databricks_RunJob

Runs a Databricks job and monitors it until completion. The RunMyJobs Process will remain in a Running state until the Databricks job completes. If the Databricks Job succeeds, the RunMyJobs process will complete successfully. If the Databricks Job fails, the RunMyJobs process will complete in Error, and any available error information is written to the stdout.log file. Parameters are available on the definition to pass input parameters for the different types of Databricks tasks. For example, adding a value to the Python Parameters parameter will make that parameter available to all Python tasks in the Databricks Job. If the job does not require parameters for a certain task type, leave that parameter empty. See the parameters table below for more information.

Parameters

Name	Description	Documentation	Data Type	Direction	Values
`connection`	Connection	The Connection object that defines the connection to the Databricks application.	String	In
`jobId`	Job ID to run	This is the Job ID in Databricks to execute	String	In
`sparkJarParameters`	Spark Jar Parameters	An array of Spark Jar Parameters to be used on the Databricks Job	String	In
`sparkSubmitParameters`	Spark Submit Parameters	An array of Spark Submit Parameters to be used on the Databricks Job	String	In
`notebookParameters`	Notebook Parameters	An array key=value pairs of Notebook Parameters to be used on the Databricks Job	String	In
`pythonParameters`	Python Parameters	An array of Python Parameters to be used on the Databricks Job	String	In
`pythonNamedParameters`	Python Named Parameters	An array key=value pairs of Python Named Parameters to be used on the Databricks Job	String	In
`sqlParameters`	SQL Parameters	An array key=value pairs of SQL Parameters to be used on the Databricks Job	String	In
`dbtParameters`	DBT Parameters	An array of DBT Parameters to be used on the Databricks Job	String	In
`pipelineFullRefresh`	Pipeline Full Refresh	Should a full refresh be performed on the Databricks Pipeline Job	String	In	`Y`=Yes, `N`=No
`runId`	Databricks Run ID	The Run ID of the executed Job on the Databricks side	String	Out

Redwood_Databricks_RunJob_Template

This template definition is provided to facilitate creating definitions that run specific Databricks jobs. Its functionality and parameters are the same as the Redwood_Databricks_RunJob definition.

To create a definition, Choose New (from template) from the context menu of Redwood_Databricks_RunJob_Template.

Note: To provide a default value for the Connection in the Connection parameter of the template, you must use the full Business Key of the Connection: EXTConnection:<Partition>.<ConnectionName>. For example: EXTConnection:GLOBAL.MyDatabricksConnection.

Parameters

Name	Description	Documentation	Data Type	Direction	Values
`connection`	Connection	The Connection object that defines the connection to the Databricks application.	String	In
`jobId`	Job ID to run	This is the Job ID in Databricks to execute	String	In
`sparkJarParameters`	Spark Jar Parameters	An array of Spark Jar Parameters to be used on the Databricks Job	String	In
`sparkSubmitParameters`	Spark Submit Parameters	An array of Spark Submit Parameters to be used on the Databricks Job	String	In
`notebookParameters`	Notebook Parameters	An array key=value pairs of Notebook Parameters to be used on the Databricks Job	String	In
`pythonParameters`	Python Parameters	An array of Python Parameters to be used on the Databricks Job	String	In
`pythonNamedParameters`	Python Named Parameters	An array key=value pairs of Python Named Parameters to be used on the Databricks Job	String	In
`sqlParameters`	SQL Parameters	An array key=value pairs of SQL Parameters to be used on the Databricks Job	String	In
`dbtParameters`	DBT Parameters	An array of DBT Parameters to be used on the Databricks Job	String	In
`pipelineFullRefresh`	Pipeline Full Refresh	Should a full refresh be performed on the Databricks Pipeline Job	String	In	`Y`=Yes, `N`=No
`runId`	Databricks Run ID	The Run ID of the executed Job on the Databricks side	String	Out

Redwood_Databricks_ShowJobs

Lists all existing jobs in Databricks. Fetches information about the available Databricks Jobs. Job properties for returned jobs are written to the stdout.log file, the file named listing.rtx, as well as the Out parameter Job Listing.

Parameters

Name	Description	Documentation	Data Type	Direction
`connection`	Connection	The Connection object that defines the connection to the Databricks application.	String	In
`filter`	Job Name Filter	This filter can be used to limit the amount of jobs returned to those which name matches the filter. Wildcards `*` and `?` are allowed.	String	In
`listing`	Job listing	The listing of all jobs available that match the input filter (or any if no input filter was provided)	Table	Out

Procedure

Create a Connection To Databricks

Navigate to Custom > Connections.
Click .
Choose Databricks Connection under Select a Connection Type.
Choose Next or Basic Propertiesand create a Queue and Process Server for your Databricks connection. All required settings will be configured automatically.
Choose Next or Security, this is a screen for all components, click to specify which roles can access the connection information.
Choose NextDatabricks Connection Properties and choose between Basic and Personal Access Token authentication.
- For Basic authentication, you specify URL, Username, and your Password.
- For API Key authentication, you specify URL, Username, and your Access Token.
Navigate to Environment > Process Server, locate your Databricks Process Server and start it, ensure it reaches status Running.

Listing Databricks Jobs

Navigate to Definitions > Processes.
Choose Submit from the context menu of Redwood_Databricks_ShowJobs.
Select the connection in the Connection field, specify an optional name filter in the Job Name Filter parameter, and choose Submit to list all available jobs.

Databricks Connector 1.0.0.0

Prerequisites

Contents of the Component

Process Definitions

Redwood_Databricks_ImportJob

Parameters

Redwood_Databricks_RunJob

Parameters

Redwood_Databricks_RunJob_Template

Parameters

Redwood_Databricks_ShowJobs

Parameters

Procedure

Create a Connection To Databricks

Listing Databricks Jobs

See Also