Azure Data Platform · Azure Services · Data Engineering

Service Principal Creation for Accessing ADLS Gen 2 in Azure Databricks

Introduction

In this blogpost, we will see the creation of Service Principal via Azure Portal to access Azure Data Lake Storage(ADLS) Gen 2 in Azure Databricks and we would also see from the scratch like creating resources : Resource Group, Key vault, ADLS Gen 2, Azure Databricks and App Registrations in Azure Active Directory and IAM Access Control.

Prerequisite

  • Active Azure Subscription. If you don’t have, create a free account.
  • Subscription Level Contributor access or Owner access with basic understandings of using Azure and its components. we can check our Access in Subscriptions–>Access Control (IAM)–>View my Access. (like below)
Type Subscriptions in Global Search Resources Text Box
Check our Access at Subscription level

Resource Group

Create a new separate Resource Group(RG) for easy handling. Select Subscription and provide Resource Group name and choose Region and keep rest of the settings as default and click Review and after validated, click Create.

Key Vault

Create key Vault for storing all kind of credentials and in our case, for storing secrets. Select Subscription as well as Resource Group and choose Region and provide Key vault Name and keep rest of the settings as default and click Review and after validated, click Create.

Storage Account – ADLS Gen 2

Create a Storage Account – Azure Data Lake Storage(ADLS) Gen-2 by enabling Hierarchical Namespace option in the Advanced TAB. Select Subscription as well as Resource Group and choose Region and provide Storage Account Name(Globally Unique) and Performance(Standard is enough for Dev/Test) and Redundancy(LRS is enough for Dev/Test) and go to Advanced TAB and choose/enable Hierarchical namespace option as True there and keep rest of the settings as default and click Review and after validated, click Create.

Basics TAB
Advanced TAB

Azure Databricks Workspace

Create a Azure Databricks workspace by choosing Subscription, Resource group, Region and providing Workspace instance Name and Pricing Tier selection (Trial is enough for Dev/Test).

Once workspace is created, then click ‘Launch Workspace‘.

Create a New Cluster with all required configurations as mentioned in below picture, using COMPUTE blade. Creating Cluster will help us to test our creation of service principal via notebooks.

Workspace Creation
Click Launch Workspace
Cluster Creation under COMPUTE blade

1. App Registrations in Azure Active Directory

In Azure Active Directory –> App registrations –> + New registration where we can provide ‘App Name’ and register app.
Once created, it will be provided with 1.’Application (client) ID‘ as well as 2.’Directory (tenant) ID‘ with ‘Object ID’ etc.,
Choose ‘Certificates & secrets‘ blade and create + New Client Secret where provide a client secret name and create app credentials and copy/store the one time secret value somewhere until we store it in key vault.

App registrations
Register an App by providing App Name
Create Client Secret
List Client Secret and Save Secret Value

2. Key vault – Storing Secrets

In Key vaults –> under ‘Secrets‘ blade, use +Generate/import option to create & store below 3 secrets.

  • ApplicationClientID
  • DirectoryTenantID
  • ApplicationClientSecretValue

Go to the ‘Properties‘ blade and look at the 2 main things:

  1. Vault URI (also known as DNS Name)
  2. Resource ID

which we need to use in Azure Databricks workspace while creating Secret Scopes.

KV – Secrets blade – create
KV – Secrets blade – listing
KV – Properties blade

3. Access control (IAM)

ADLS Gen2 Storage Account’s Resource Group level, under ‘Access control (IAM)‘ blade, click + Add –> Add Role Assignment, then provide required below details:

  1. Role : Storage Blob Data Contributor (Allows for read, write and delete access to Azure Storage blob containers and data)
  2. Assign access to (Select the type of security principal to assign the role to.) : choose ‘User, group or service principal
  3. Select (Search for a security principal by entering a string to search for name or email address) : Type/search & select the Service principal(the #1’s Registered App Name of Azure AD).

And Finally Click Save.

Resource Group’s Access Control IAM

4. Secret Scope in Azure Databricks

Go to url : ” https:// “databricks-instance” #secrets/createScope

eg., for “databricks-instance” is adb-<16-digit-id>.<2-digit-id>.azuredatabricks.net/?o=<16-digit-id>

Provide

  • Scope Name (A name, unique within a workspace that identifies a secret scope. Scope names are readable by all users. A workspace is limited to a maximum of 100 scopes.)
  • Manage Principal(Defines the manage principal of the secret scope. [All Users] (All users in the workspace) or [Creator] (Only the creator of this secret scope). The [Creator] option works only in a Premium Tier.) : Creator(default)/All users
  • Azure Key vault (Defines the manage principal of the secret scope. [All Users] (All users in the workspace) or [Creator] (Only the creator of this secret scope). The [Creator] option works only in a Premium Tier.)


DNS Name & Resource ID –> Use #2’s Key Vault Properties and provide Vault URI aka DNS Name as well as Resource ID and create secret Scope in Azure Databricks.
[Please Note down the ‘secret scope name’ that we are going to use in our sample scripts later]

Important terms

  • AppRegistrationName : SAADLSGen2Access (Naming as per wish)
  • ApplicationClientID : xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  • DirectoryTenantID : xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  • ApplicationClientSecretValue : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  • Vault URI aka DNS Name : ” https:// “Key-vault-Name” .vault.azure.net/ “
  • Resource ID : /subscriptions/”Subscription-Name”/resourceGroups/”RG-Name”/providers/Microsoft.KeyVault/vaults/”Key-vault-Name”
  • Secret Scope Name : ADLSGenTwoAccessKV (Naming as per wish)

Unit Test the access

Unit Test the access(ADLS Gen 2-Azure Databricks) via Service Principal

Recent Related Posts

Azure Resource Lock

We can even apply Azure Resource Lock that prevents accidental deletions and modifying of the resources.

Summary

Thus, as per the main purpose of this blog – Service principal Creation is now successfully done and we can able to access our ADLS gen 2 Storage blobs/files in Azure Databricks via mounting/unmounting options like above. We will see how to mount/unmount in secure way and we also see how to unit test like above in our upcoming blogposts.

Follow Blog and Show your Support for many more interesting upcoming Posts!
Advertisement

2 thoughts on “Service Principal Creation for Accessing ADLS Gen 2 in Azure Databricks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s