Skip to content

data-platform-hq/terraform-azurerm-data-factory

Repository files navigation

Azure Data Factory Terraform module

Terraform module for Azure Data Factory and it's components creation

Usage

Currently, this module provides an ability to provision Data Factory Studio, Integration Runtime within managed network, Diagnostic Settings and Managed endpoints.

data "azurerm_databricks_workspace" "example" {
  name                = "example-adb-workspace"
  resource_group_name = "example-rg"
}

data "azurerm_log_analytics_workspace" "example" {
  name                = "example-law-workspace"
  resource_group_name = "example-rg"
}

module "data_factory" {
  source  = "data-platform-hq/data-factory/azurerm"

  project                 = "datahq"
  env                     = "example"
  location                = "eastus"
  resource_group          = "example-rg"
  key_vault_name          = "example-key-vault"

  # Target Log Analytics Workspace used by Diagnostic Settings for log/metrics storage
  log_analytics_workspace = { 
    (data.azurerm_log_analytics_workspace.example.name) = data.azurerm_log_analytics_workspace.example.id 
  }
  
  # Set of Objects with parameters to create Managed endpoints in Integration Runtime Managed network.
  managed_private_endpoint = [{
    name               = "adb"
    target_resource_id = data.azurerm_databricks_workspace.example.id
    subresource_name   = "databricks_ui_api"
  }]
}

Note:

Be aware that private endpoint connection is created in a Pending state and a manual approval is required.

To finish this configuration you have to open Azure Databricks Service (or other Azure Service you connect to), select "Networking" in Settings section. Change to Private endpoint connections tab and select created connection (it should be in a pending state) and press "Approve" button.

If your deployment creates multiple managed private endpoints for different Azure services, you must approve all of them.

Managed private endpoint approve

Requirements

Name Version
terraform >= 1.0.0
azurerm >= 4.0.1

Providers

Name Version
azurerm >= 4.0.1

Modules

No modules.

Resources

Name Type
azurerm_data_factory.this resource
azurerm_data_factory_integration_runtime_azure.auto_resolve resource
azurerm_data_factory_integration_runtime_self_hosted.this resource
azurerm_data_factory_managed_private_endpoint.this resource
azurerm_monitor_diagnostic_setting.this resource
azurerm_role_assignment.data_factory resource
azurerm_monitor_diagnostic_categories.this data source

Inputs

Name Description Type Default Required
analytics_destination_type Log analytics destination type string "Dedicated" no
cleanup_enabled Cluster will not be recycled and it will be used in next data flow activity run until TTL (time to live) is reached if this is set as false bool true no
compute_type Compute type of the cluster which will execute data flow job: [General|ComputeOptimized|MemoryOptimized] string "General" no
core_count Core count of the cluster which will execute data flow job: [8|16|32|48|144|272] number 8 no
custom_adf_name Specifies the name of the Data Factory string null no
custom_default_ir_name Specifies the name of the Managed Integration Runtime string null no
custom_diagnostics_name Specifies the name of Diagnostic Settings that monitors ADF string null no
custom_shir_name Specifies the name of Self Hosted Integration runtime string null no
env Environment name string n/a yes
global_parameter Configuration of data factory global parameters
list(object({
name = string
type = optional(string, "String")
value = string
}))
[] no
location Azure location string n/a yes
log_analytics_workspace Log Analytics Workspace Name to ID map map(string) {} no
managed_private_endpoint The ID and sub resource name of the Private Link Enabled Remote Resource which this Data Factory Private Endpoint should be connected to
set(object({
name = string
target_resource_id = string
subresource_name = string
}))
[] no
managed_virtual_network_enabled Is Managed Virtual Network enabled? bool true no
permissions Data Factory permision map list(map(string))
[
{
"object_id": null,
"role": null
}
]
no
project Project name string n/a yes
public_network_enabled Is the Data Factory visible to the public network? bool false no
resource_group The name of the resource group in which to create the storage account string n/a yes
self_hosted_integration_runtime_enabled Self Hosted Integration runtime bool false no
tags A mapping of tags to assign to the resource map(any) {} no
time_to_live_min TTL for Integration runtime string 15 no
virtual_network_enabled Managed Virtual Network for Integration runtime bool true no
vsts_configuration Code storage configuration map map(string) {} no

Outputs

Name Description
default_integration_runtime_name Data Factory Default Integration Runtime Name
id Data Factory ID
identity Data Factory Managed Identity
name Data Factory Name
self_hosted_integration_runtime_key Self hosted integration runtime primary authorization key

License

Apache 2 Licensed. For more information please see LICENSE