Voiced by Amazon Polly |
Azure Databricks is a fully managed, Platform-as-a-Service (PaaS) offering which was released on Feb 27, 2019, Azure Databricks leverages Microsoft cloud to scale rapidly, host massive amounts of data effortlessly, and streamline workflows for better collaboration between business executives, data scientists and engineers.
Azure Databricks is a “first party” Microsoft service, the result of a unique year-long collaboration between the Microsoft and Databricks teams to provide Databricks‘ Apache Spark-based analytics service as an integral part of the Microsoft Azure platform.
Azure Databricks uses the Azure Active Directory (AAD) security framework. Existing credentials authorization can be utilized, with the corresponding security settings. Access and identity control are all done through the same environment. Using AAD allows easy integration with the entire Azure stack including Data Lake Storage (as a data source or an output), Data Warehouse, Blob Storage, and Azure Event Hub.
You can use Blob storage to expose data publicly to the world or to store application data privately. For those of you familiar with Azure, Databricks is a premier alternative to Azure HDInsight and Azure Data Lake Analytics.
Customized Cloud Solutions to Drive your Business Success
- Cloud Migration
- Devops
- AIML & IoT
Connecting Azure Databricks to the Azure Storage Account
- Create a Storage Account and create a container(private) and upload a blob file in it.
- Upload the blob file into the container, you can download the file from the given link: https://csg10032000aeaa88a0.blob.core.windows.net/datafile/employe_data.csv
- Click on the context menu and click Generate SAS and copy the blob SAS Token and store it somewhere we will use it in future.
- Create an Azure Databricks
- Now click on create and select the subscription if you have many and select/create the resource group name, choose the location where you are trying to create these data bricks and finally select the pricing tier
- Remain the changes and click on Review + Create and wait for the validation
- Click on Create once your validation completes
- Click on Go to resource button once your deployment completes.
- Click on Launch Workspace then it will redirect to the Azure Databricks page.
- Now click on Clusters in the left pane and click on Create Cluster and provide the cluster name and Cluster-Mode as Standard and select the configuration details as same mentioned below and create the cluster
- Now start your cluster and make sure your cluster should be in a running state
- Now click on the workspace at the left pane, you can see one more workspace then right-click on workspace -> create -> notebook
- Now give the name of the notebook select Scala in Default Language and select the previous cluster that you have created and click on Create
- Now paste the below code in the notebook in order to make the connection with your storage account.
1 2 3 4 5 6 7 8 9 10 11 12 |
val containerName = "<Container Name>" val storageAccountName = "<StorageAccount Nmae>" val sas = "<Generated SAS Key>" val config = "fs.azure.sas." + containerName+ "." + storageAccountName + ".blob.core.windows.net" dbutils.fs.mount( source = "wasbs://"+containerName+"@"+storageAccountName+".blob.core.windows.net/employe_data.csv", extraConfigs = Map(config -> sas)) val mydf = spark.read.option("header","true").option("inferSchema", "true").csv("/mnt/myfile") display(mydf) |
- If you can fetch the data as shown below then you have successfully completed connecting your Azure DataBricks with your storage Account.
Conclusion:
So far, we understood about Azure DataBricks creation, creating cluster and Notebook and connecting our storage account with DataBricks to access the data using Scala. Engineers who collaborate with business stakeholders to identify and meet data requirements while designing and implementing the management, monitoring, security and privacy of data using the full stack of Azure services to satisfy business needs will benefit extensively from understanding Databricks.
Join our online forum discussions and study groups to enrich your knowledge in pursuit of becoming a Data Science expert with DP-200 Exam: Implementing an Azure Data Solution. Here is a comprehensive study guide to help you crack the exam along with sample questions.
Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.
- Cloud Training
- Customized Training
- Experiential Learning
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner, Amazon CloudFront Service Delivery Partner, Amazon OpenSearch Service Delivery Partner, AWS DMS Service Delivery Partner, AWS Systems Manager Service Delivery Partner, Amazon RDS Service Delivery Partner, AWS CloudFormation Service Delivery Partner, AWS Config, Amazon EMR and many more.
WRITTEN BY Shaik Munwar Basha
Pavan Vishwanath Pochinapeddi
May 25, 2021
Excellent and very informative
Harika
May 20, 2021
Thanks for such good article,it was very useful and helped me alot
Siva Sai
May 19, 2021
Very Informative Blog
Mani
May 11, 2021
Super
Mani
May 11, 2021
Super excellent work… helped me a lot bro..
rohan saini
May 10, 2021
This article save my lot of time
Venkatesh
May 10, 2021
Very useful information sir