Azure Data Factory Connector 1.0.0.0

Prerequisites

The Redwood Azure Data Factory Connector is packaged as a separate CAR file.

  • RunMyJobs 9.2.6 or later
  • data-factory.car file for import

Installation

You can install the CAR file using standard promotion.

Note: By default, the Data Factory CAR file requires the Partition DATAFACTORY to exist before it can be imported. You can either create this Partition before import, or select an existing Partition during import.

Contents of the CAR file

The CAR file contains the following objects.

Object Type Name
Application DATAFACTORY.Redwood_DataFactory
Process Definition DATAFACTORY.DataFactory_ImportJobTemplate
Process Definition DATAFACTORY.DataFactory_ShowPipelines
Process Definition DATAFACTORY.DataFactory_RunPipeline
Process Definition DATAFACTORY.DataFactory_Template
Library DATAFACTORY.Redwood_DataFactory

Setup

Data Factory processes need their own Process Server and Queue, and this Process Server and Queue must not be in Partition GLOBAL. By default, all the Data Factory objects live in the Partition DATAFACTORY in on-premises environments, and in your Partition in cloud environments.

In order to connect to your Data Factory instance, you need to create an app registration with a service principle in Azure Active Directory (for more information, see https://docs.microsoft.com/en-gb/azure/active-directory/develop/howto-create-service-principal-portal#register-an-application-with-azure-ad-and-create-a-service-principal). This client application needs to be assigned the Data Factory Contributor permission.

From the app, registration register the following settings:

  • Tenant ID
  • App/Client ID
  • Client Secret
  • Subscription ID

And from the Data Factory:

  • Resource Group Name
  • Factory Name

With this information we can setup a Credential to connect to Data Factory:

  • Partition: Generally, Credentials need to be either in the Partition of your Process Server or in Partition GLOBAL. With this Connector, though, the Process Server must not reside in the GLOBAL Partition, so the credential must not either.
  • Protocol: soap.
  • Real User: The App/Client ID.
  • Password: The Client Secret.
  • Endpoint: The tenant ID.
  • Virtual User: The name of the Data Factory Process Server.

Example of a credential:

Example Azure credential

The Process Server DataFactory_ProcessServer is used as an example to set up the connection to a Data Factory instance system. In order to connect to more than one Data Factory system, duplicate the DataFactory_ProcessServer Process Server and create a new Queue and a credential of type SOAP for that Data Factory system. Make sure that the ScriptService sevice with the RedwoodScript Definition Type and the JobChainService with the JobChain Definition Type are enabled for the Process Server.

Example of a Process Server:

Example Azure process server

Running Data Factory Processes

Finding Data Factory Pipelines

To retrieve the list of pipelines available for scheduling, go to the Redwood_DataFactory Application, click Applications > Redwood_DataFactory > DataFactory_ShowPipelines, and submit it.

DataFactory appplication

Specify the Azure Subscription ID, the Azure Resource Group Name and the Azure Factory Name you want to list the pipelines from. You can filter the list by adding a Process Name Filter.

Retrieve DataFactory pipelines

Select the correct Queue.

Specify the correct queue

Once the process has finished, click on the stderr.log, and you will see the output as follows:

Resulting pipeline list

Here you can find the value later used as pipeline name, the first element straight after the index.

Schedule a Data Factory pipeline

In the Redwood_DataFactory Application, submit DataFactory_RunPipeline.

Run a pipeline from the list

Again, specify the Azure Subscription ID, the Azure Resource Group Name the Azure Factory Name you want to run the pipelines from, and the Pipeline Name for the pipeline to execute.

Import Pipelines as Process Definitions

Submit DataFactory_ImportJobTemplate to import a pipeline as a Process Definition.

Import pipeline as process definition

Here the pipeline name can be used to only import a selection of pipelines. Also, the Overwrite flag can be set to specify that existing definitions can be overwritten. The Target tab lets you specify a target Partition, Application, and prefix for the generated Process Definition.

Customize the import

Troubleshooting

In the Control step of the Submit Wizard, where you select the Queue, you can add additional logging to the output files by selecting Debug in the Out Log and Error Log fields on the Advanced Options tab.

Troubleshooting the process by specifying advanced logging

Privileges Required

To use Azure Data Factory, you need one of the following:

  • scheduler-administrator or redwood-administrator role.
  • scheduler-user or redwood-login role in combination with the following system-wide, Partition-wide or object-level privileges.

Built-in Roles

  • The scheduler-administrator or redwood-administrator built-in role provides full control over all Azure Data Factory connections.

Creating, Modifying Azure Data Factory Connections

You need all of the following privilege ranks to be able to create and modify Azure Data Factory connections:

  • View - on Partitions GLOBAL and REDWOOD.
  • View - on Application REDWOOD.Redwood_DataFactory.
  • View - on library REDWOOD.Redwood_DataFactory.
  • View - on the existing Azure Data Factory Process Server and Queue, if applicable.
  • Create - on Process Server and Queue if a new Process Server and Queue are required.
  • Submit - on Process Definition Sharepoint_Configure.
  • Either:
    • Create - on credential in the REDWOOD Partition if a credential needs to be created.
    • Edit - on credential in the REDWOOD Partition if a credential needs to be modified.

Using the Azure Data Factory Component

You need all of the following privilege ranks to be able to use an Azure Data Factory connection:

  • View - on Partitions GLOBAL and REDWOOD.
  • View - on Application REDWOOD.Redwood_DataFactory.
  • View - on REDWOOD.Redwood_DataFactory library.
  • Submit - on Process Definitions REDWOOD.DataFactory_ImportJobTemplate, REDWOOD.DataFactory_ShowPipelines, REDWOOD.DataFactory_RunPipeline, and/or REDWOOD.DataFactory_Template (depending on the functionality you want).
  • View - on the Process Server of the Azure Data Factory connection.
  • JobAdministrator - on the Queue of the Azure Data Factory component.
  • View - on credential in the REDWOOD Partition or on the credential to use.

Deleting Azure Data Factory

  • View - on Partition REDWOOD.
  • Delete - on credential in REDWOOD Partition.

See Also