Cloud Computing, Data Analytics

4 Mins Read

Revolutionize Data Sharing: Delta Sharing by Databricks for Optimal Data Utilization

Voiced by Amazon Polly

Introduction

Databricks offering the ‘Delta Sharing’ feature is the first ever open protocol for secure data sharing over the cloud regardless of the computing platforms that both providers and recipients use.

Delta sharing can help organizations enabling with cross-organization data access mechanisms and secure the exchange of large datasets in real-time across cloud products.

Delta sharing also helps grant read-only access to large data sets by leveraging modern cloud storage systems such as AWS S3, Google’s GCS, and Azure ADLS.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Terminologies in Delta Sharing

To understand how Databricks’s delta sharing works, let’s start with some useful terminology that helps us in working with Delta Sharing:

  1. Share:
  • A set of tables or table partitions to be shared with one or more recipients with read-only permission.
  • A share must be registered as a secure object in the unity catalog metastore.
  • A share must contain tables from a single unity catalog metastore, and you can add or remove tables from a share at any time. Similarly, you can assign or revoke data permissions to share recipients anytime.
  • If you delete a share from the unity catalog, all the share recipients will lose access to the shared data.

2. Recipient:

  • A recipient can be defined as an organization with valid credentials or a secure sharing identifier with which he can access one or more data shares.
  • To access a share, a recipient must be registered as a unity catalog metastore object.
  • If a data provider wants to share data from multiple unity catalog metastores, then the recipient must be registered in all the unity catalog metastores.
  • The recipient will lose access to a share if deleted from a unity catalog metastore.

3. Open delta sharing

  • Open delta sharing lets a data provider share data whether or not the recipient has access to Databricks.
  • The responsibility of the data provider is to generate a token and share it securely with the recipient. The recipient can use that token to authenticate and access the tables included in the data share.
  • Recipients can access the shared data using many computing tools such as Azure Databricks, Apache Spark, Pandas, and Power BI.

4. Databricks to databricks delta sharing

  • Databricks to Databricks Delta Sharing enables a data provider to share data with existing Databricks users. The Databricks access permissions set of the data recipient might differ from that of a data provider, but this works as long as the recipient has access to a Databricks workspace.
  • Regardless of the cloud provider the Databricks account is hosted in, Databricks to Databricks delta sharing lets a data provider share data with a recipient whether they are on AWS, Azure, or GCP.
  • Another advantage of this type of sharing is that it allows providers to securely share data across multiple Unity catalog metastores in the same Databricks account.
  • Using this type of sharing removes the burden on recipients as it doesn’t need a token to access the share and also on data providers as they don’t have to manage the recipient tokens.

How does Delta Sharing Work?

  1. The data provider decides what data they want to share and runs a sharing server in front of it that implements the delta sharing protocol and manages access for recipients. The recipient’s client authenticates to the sharing server and asks to query a table. The client can also provide filters to read a subset of data.
  2. The server verifies whether the client can access the data, logs the request, and determines which data to send back.
  3. To allow temporary access to the data, the server generates short-lived pre-signed URLs that allow the client to read these parquet files directly from the cloud provider so that read-only access can happen in parallel at massive bandwidth without streaming through the sharing server.

ad

Key Benefits of Delta Sharing

  1. Open-source project: Delta sharing supports open-source Delta and Apache parquet formats. Data providers and recipients need not be in the same cloud as delta sharing works across multiple clouds and from cloud to on-premises setups.
  2. No need for data replication: Most enterprise data is in cloud data lakes these days. Any of the existing data in the provider’s data lakes can be easily shared without the need for data replication or data movement.
  3. Centralized governance: With Databricks Delta Sharing, data providers can grant, track, audit, and even revoke access to shared data sets from a single enforcement point to meet compliance and other regulatory requirements.
  4. Flexibility: To meet today’s consumer demands, Delta Sharing offers flexibility to share non-tabular data and data derivatives such as data streams, ML models, SQL views, and arbitrary files. Data providers can build, package, and distribute data products, including data sets, ML, and notebooks, which helps data recipients to get data insights faster.
  5. Lower cost: Data providers can share data from their existing cloud object store without replicating it, thereby reducing the storage cost. With delta sharing, data providers are not required to set up separate computing environments to share data. Data recipients can access the data with the tool of their choice without setting up specific consumption ecosystems, thereby reducing costs.

Conclusion

Delta sharing is designed to be a simple, scalable, non-proprietary, and cost-effective solution by Databricks for organizations more serious about getting more from their data. In terms of reduced time-to-value, Delta Sharing is the best option. Moreover, the Delta Sharing ecosystem is growing day by day.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

  • Reduced infrastructure costs
  • Timely data-driven decisions
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAWS GenAI Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery Partner AWS Microsoft Workload PartnersAmazon EC2 Service Delivery PartnerAmazon ECS Service Delivery PartnerAWS Glue Service Delivery PartnerAmazon Redshift Service Delivery PartnerAWS Control Tower Service Delivery PartnerAWS WAF Service Delivery PartnerAmazon CloudFront Service Delivery PartnerAmazon OpenSearch Service Delivery PartnerAWS DMS Service Delivery PartnerAWS Systems Manager Service Delivery PartnerAmazon RDS Service Delivery PartnerAWS CloudFormation Service Delivery PartnerAWS ConfigAmazon EMR and many more.

FAQs

1. Are there any prerequisites one should meet before opting for Delta Sharing?

ANS: – The data that want to be shared must be registered with the unity catalog to be available for secure sharing, and sharing views is not currently supported.

2. Is there any limitation users must consider while using Delta Sharing?

ANS: – Data must be in ‘Delta table format’ to use delta sharing.

3. Are there any limited quotas for Delta Sharing resources?

ANS: – The values below indicate the resource quotas for delta sharing. If we expect to exceed these resource limits, we must contact a Databricks account representative. table

WRITTEN BY Yaswanth Tippa

Yaswanth Tippa is working as a Research Associate - Data and AIoT at CloudThat. He is a highly passionate and self-motivated individual with experience in data engineering and cloud computing with substantial expertise in building solutions for complex business problems involving large-scale data warehousing and reporting.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!