Cloud Computing, Data Analytics

4 Mins Read

Revolutionize Data Sharing: Delta Sharing by Databricks for Optimal Data Utilization

Voiced by Amazon Polly

Introduction

Databricks offering the ‘Delta Sharing’ feature is the first ever open protocol for secure data sharing over the cloud regardless of the computing platforms that both providers and recipients use.

Delta sharing can help organizations enabling with cross-organization data access mechanisms and secure the exchange of large datasets in real-time across cloud products.

Delta sharing also helps grant read-only access to large data sets by leveraging modern cloud storage systems such as AWS S3, Google’s GCS, and Azure ADLS.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Terminologies in Delta Sharing

To understand how Databricks’s delta sharing works, let’s start with some useful terminology that helps us in working with Delta Sharing:

  1. Share:
  • A set of tables or table partitions to be shared with one or more recipients with read-only permission.
  • A share must be registered as a secure object in the unity catalog metastore.
  • A share must contain tables from a single unity catalog metastore, and you can add or remove tables from a share at any time. Similarly, you can assign or revoke data permissions to share recipients anytime.
  • If you delete a share from the unity catalog, all the share recipients will lose access to the shared data.

2. Recipient:

  • A recipient can be defined as an organization with valid credentials or a secure sharing identifier with which he can access one or more data shares.
  • To access a share, a recipient must be registered as a unity catalog metastore object.
  • If a data provider wants to share data from multiple unity catalog metastores, then the recipient must be registered in all the unity catalog metastores.
  • The recipient will lose access to a share if deleted from a unity catalog metastore.

3. Open delta sharing

  • Open delta sharing lets a data provider share data whether or not the recipient has access to Databricks.
  • The responsibility of the data provider is to generate a token and share it securely with the recipient. The recipient can use that token to authenticate and access the tables included in the data share.
  • Recipients can access the shared data using many computing tools such as Azure Databricks, Apache Spark, Pandas, and Power BI.

4. Databricks to databricks delta sharing

  • Databricks to Databricks Delta Sharing enables a data provider to share data with existing Databricks users. The Databricks access permissions set of the data recipient might differ from that of a data provider, but this works as long as the recipient has access to a Databricks workspace.
  • Regardless of the cloud provider the Databricks account is hosted in, Databricks to Databricks delta sharing lets a data provider share data with a recipient whether they are on AWS, Azure, or GCP.
  • Another advantage of this type of sharing is that it allows providers to securely share data across multiple Unity catalog metastores in the same Databricks account.
  • Using this type of sharing removes the burden on recipients as it doesn’t need a token to access the share and also on data providers as they don’t have to manage the recipient tokens.

How does Delta Sharing Work?

  1. The data provider decides what data they want to share and runs a sharing server in front of it that implements the delta sharing protocol and manages access for recipients. The recipient’s client authenticates to the sharing server and asks to query a table. The client can also provide filters to read a subset of data.
  2. The server verifies whether the client can access the data, logs the request, and determines which data to send back.
  3. To allow temporary access to the data, the server generates short-lived pre-signed URLs that allow the client to read these parquet files directly from the cloud provider so that read-only access can happen in parallel at massive bandwidth without streaming through the sharing server.

ad

Key Benefits of Delta Sharing

  1. Open-source project: Delta sharing supports open-source Delta and Apache parquet formats. Data providers and recipients need not be in the same cloud as delta sharing works across multiple clouds and from cloud to on-premises setups.
  2. No need for data replication: Most enterprise data is in cloud data lakes these days. Any of the existing data in the provider’s data lakes can be easily shared without the need for data replication or data movement.
  3. Centralized governance: With Databricks Delta Sharing, data providers can grant, track, audit, and even revoke access to shared data sets from a single enforcement point to meet compliance and other regulatory requirements.
  4. Flexibility: To meet today’s consumer demands, Delta Sharing offers flexibility to share non-tabular data and data derivatives such as data streams, ML models, SQL views, and arbitrary files. Data providers can build, package, and distribute data products, including data sets, ML, and notebooks, which helps data recipients to get data insights faster.
  5. Lower cost: Data providers can share data from their existing cloud object store without replicating it, thereby reducing the storage cost. With delta sharing, data providers are not required to set up separate computing environments to share data. Data recipients can access the data with the tool of their choice without setting up specific consumption ecosystems, thereby reducing costs.

Conclusion

Delta sharing is designed to be a simple, scalable, non-proprietary, and cost-effective solution by Databricks for organizations more serious about getting more from their data. In terms of reduced time-to-value, Delta Sharing is the best option. Moreover, the Delta Sharing ecosystem is growing day by day.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

  • Reduced infrastructure costs
  • Timely data-driven decisions
Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. Are there any prerequisites one should meet before opting for Delta Sharing?

ANS: – The data that want to be shared must be registered with the unity catalog to be available for secure sharing, and sharing views is not currently supported.

2. Is there any limitation users must consider while using Delta Sharing?

ANS: – Data must be in ‘Delta table format’ to use delta sharing.

3. Are there any limited quotas for Delta Sharing resources?

ANS: – The values below indicate the resource quotas for delta sharing. If we expect to exceed these resource limits, we must contact a Databricks account representative. table

WRITTEN BY Yaswanth Tippa

Yaswanth is a Data Engineer with over 4 years of experience in building scalable data pipelines, managing Azure and Databricks platforms, and leading data governance initiatives. He specializes in designing and optimizing enterprise analytics solutions, drawing on his experience supporting multiple clients across diverse industries. Passionate about knowledge sharing, Yaswanth writes about real-world challenges, architectural best practices, and lessons learned from delivering robust, data-driven products at scale.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!