Voiced by Amazon Polly
Databricks offering the ‘Delta Sharing’ feature is the first ever open protocol for secure data sharing over the cloud regardless of the computing platforms that both providers and recipients use.
Delta sharing also helps grant read-only access to large data sets by leveraging modern cloud storage systems such as AWS S3, Google’s GCS, and Azure ADLS.
Terminologies in Delta Sharing
To understand how Databricks’s delta sharing works, let’s start with some useful terminology that helps us in working with Delta Sharing:
- A set of tables or table partitions to be shared with one or more recipients with read-only permission.
- A share must be registered as a secure object in the unity catalog metastore.
- A share must contain tables from a single unity catalog metastore, and you can add or remove tables from a share at any time. Similarly, you can assign or revoke data permissions to share recipients anytime.
- If you delete a share from the unity catalog, all the share recipients will lose access to the shared data.
- A recipient can be defined as an organization with valid credentials or a secure sharing identifier with which he can access one or more data shares.
- To access a share, a recipient must be registered as a unity catalog metastore object.
- If a data provider wants to share data from multiple unity catalog metastores, then the recipient must be registered in all the unity catalog metastores.
- The recipient will lose access to a share if deleted from a unity catalog metastore.
3. Open delta sharing
- Open delta sharing lets a data provider share data whether or not the recipient has access to Databricks.
- The responsibility of the data provider is to generate a token and share it securely with the recipient. The recipient can use that token to authenticate and access the tables included in the data share.
- Recipients can access the shared data using many computing tools such as Azure Databricks, Apache Spark, Pandas, and Power BI.
4. Databricks to databricks delta sharing
- Databricks to Databricks Delta Sharing enables a data provider to share data with existing Databricks users. The Databricks access permissions set of the data recipient might differ from that of a data provider, but this works as long as the recipient has access to a Databricks workspace.
- Regardless of the cloud provider the Databricks account is hosted in, Databricks to Databricks delta sharing lets a data provider share data with a recipient whether they are on AWS, Azure, or GCP.
- Another advantage of this type of sharing is that it allows providers to securely share data across multiple Unity catalog metastores in the same Databricks account.
- Using this type of sharing removes the burden on recipients as it doesn’t need a token to access the share and also on data providers as they don’t have to manage the recipient tokens.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
How does Delta Sharing Work?
- The data provider decides what data they want to share and runs a sharing server in front of it that implements the delta sharing protocol and manages access for recipients. The recipient’s client authenticates to the sharing server and asks to query a table. The client can also provide filters to read a subset of data.
- The server verifies whether the client can access the data, logs the request, and determines which data to send back.
- To allow temporary access to the data, the server generates short-lived pre-signed URLs that allow the client to read these parquet files directly from the cloud provider so that read-only access can happen in parallel at massive bandwidth without streaming through the sharing server.
Key Benefits of Delta Sharing
- Open-source project: Delta sharing supports open-source Delta and Apache parquet formats. Data providers and recipients need not be in the same cloud as delta sharing works across multiple clouds and from cloud to on-premises setups.
- No need for data replication: Most enterprise data is in cloud data lakes these days. Any of the existing data in the provider’s data lakes can be easily shared without the need for data replication or data movement.
- Centralized governance: With Databricks Delta Sharing, data providers can grant, track, audit, and even revoke access to shared data sets from a single enforcement point to meet compliance and other regulatory requirements.
- Flexibility: To meet today’s consumer demands, Delta Sharing offers flexibility to share non-tabular data and data derivatives such as data streams, ML models, SQL views, and arbitrary files. Data providers can build, package, and distribute data products, including data sets, ML, and notebooks, which helps data recipients to get data insights faster.
- Lower cost: Data providers can share data from their existing cloud object store without replicating it, thereby reducing the storage cost. With delta sharing, data providers are not required to set up separate computing environments to share data. Data recipients can access the data with the tool of their choice without setting up specific consumption ecosystems, thereby reducing costs.
Delta sharing is designed to be a simple, scalable, non-proprietary, and cost-effective solution by Databricks for organizations more serious about getting more from their data. In terms of reduced time-to-value, Delta Sharing is the best option. Moreover, the Delta Sharing ecosystem is growing day by day.
Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.
- Reduced infrastructure costs
- Timely data-driven decisions
CloudThat is an official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft Gold Partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best-in-industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.
Drop a query if you have any questions regarding Delta Sharing, I will get back to you quickly.
1. Are there any prerequisites one should meet before opting for Delta Sharing?
ANS: – The data that want to be shared must be registered with the unity catalog to be available for secure sharing, and sharing views is not currently supported.
2. Is there any limitation users must consider while using Delta Sharing?
ANS: – Data must be in ‘Delta table format’ to use delta sharing.
3. Are there any limited quotas for Delta Sharing resources?
ANS: – The values below indicate the resource quotas for delta sharing. If we expect to exceed these resource limits, we must contact a Databricks account representative.
WRITTEN BY Yaswanth Tippa
Yaswanth Tippa is working as a Research Associate - Data and AIoT at CloudThat. He is a highly passionate and self-motivated individual with experience in data engineering and cloud computing with substantial expertise in building solutions for complex business problems involving large-scale data warehousing and reporting.