Azure, Cloud Computing

3 Mins Read

Streamline Your Data Integration with Azure Data Factory (ADF)

Introduction

Azure Data Factory (ADF) service is a cloud based ETL (Extract Transform and Load) and data integration service with other services that orchestrates and automates the movement and transformation of data. It allows users to move the data between on-premises systems and cloud systems.

SSIS (SQL Server Integration Services) is the integral tool for data integration in ADF. It plays a role in integrating the data from on-premises to cloud systems.

Using Data Factory, we can create and schedule the data driven workflows called pipelines that ingest the data from disparate data stores.

Why use Data Factory?

  • Scheduling and Orchestration
  • Continuous Integration and Delivery
  • Security
  • Scalability

  • Cloud Migration
  • Devops
  • AIML & IoT
Know More

Where to use Data Factory?

For example, a top-rated gaming company collects daily huge amounts of game logs, activities, and user performance details in the cloud. So, the company focuses on its daily data to analyze these logs to predict future customer preferences and usage behavior. Also, the company wants to identify its growth and provide a better customer experience.

ADF is the best solution for this scenario as Azure Data Factory is a cloud-based ETL tool and data integration service that allows you to create workflows to orchestrate, automate the data movement, and transform the data.

How does it work?

Azure Data Factory follows a series of interconnected systems that provides a complete end-to-end platform. Please click on below link for getting more. 

https://learn.microsoft.com/en-us/azure/data-factory/media/data-flow/overview.svg 

azure

Data Factory Components

Azure Data Factory is composed of all the key components.

Datasets – These represent the data structures within the data stores, which point to the data you want to use in your activities as inputs and outputs.

Activity – It includes data transfer and control flow operations. It can take more than one dataset as input and more than one as output. It mainly supports three different types of activities: data movement, data transformation, and control activities.

Pipeline – ADF can have one or more pipelines. A pipeline is a combination of activities that performs its relevant action.

Linked Services – It is the same as the connection string, which defines the connection information needed for Data Factory to connect with external resources. This is used to represent a data store and a compute resource.

Triggers – These are configurations for pipeline scheduling that contains configuration settings like start/end dates and execution frequency etc. Its only required if you want pipelines to run automatically and on a schedule.

Variables – These will be used inside pipelines to store temporary values.

Pipelines in Azure Data Factory

An ADF can have one or more pipelines. A pipeline is a logical grouping or set of activities that perform a specific task together, allowing to manage of the activities.

Types of Scheduling Pipelines

A pipeline can be triggered by triggers. It has two types of triggers:

  • Manual Trigger: This triggers pipelines on demand.
  • Scheduler Trigger: It allows to trigger based on the clock schedule.

Activities in Azure Data Factory

An activity defines a specific action to be performed on your data. It mainly consists of three types of grouping activities that are Data Movement Activities, Data Transformation Activities, and Control Activities. The below diagram illustrates the relationship between pipeline, activity, and dataset.

azure2

Data Factory Pricing

Pricing will be calculated based on the usage of below

  • Pipeline execution and orchestration.
  • Data flow execution and debugging.
  • Several Data Factory operations like pipeline monitoring.

Runtime Integration in ADF

ADF uses the IR (Integration Runtime), the compute infrastructure, to provide data integration capabilities across the different network environments.

  • Data Flow: It executes data flow in a managed compute
  • Data Movement: It is responsible for copying data across the different stores situated in either public or private networks.
  • Activity Dispatch: This monitors the activities running on compute services like Azure HDInsight and SQL Server.
  • SSIS Package Execution: It executes SQL Server Integration Services (SSIS) package in a compute

There are three types of IR, which are supported in ADF, and we need to choose any of the below IR types that serve data integration capabilities and network environments requirements:

  • Azure – Azure supports data flow, movement, and activity dispatch in both public and private network support.
  • Self-hosted – supports only data movement and activity dispatch in public and private network support.
  • Azure-SSIS – supports only SSIS package execution in both public network and private link support.

Conclusion

ADF is a powerful service in Azure for developers to perform different types of operations on the data. It also includes some components, and each component has its potential role in every operation on data. The cost of this service is based on the usage of several pipelines and other operations in ADF service.

Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.

  • Cloud Training
  • Customized Training
  • Experiential Learning
Read More

About CloudThat

CloudThat is also the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding Azure Data Factory (ADF) and I will get back to you quickly.

To get started, go through our Consultancy page and Managed Services Package that is CloudThat’s offerings.

FAQs

1. What is the use of the wait activity?

ANS: – The pipeline waits for the specified time before resuming the execution of an activity.

2. Is Azure Synapse Analytics follows the concept of the pipeline?

ANS: – Yes, both Azure Data Factory and Synapse Analytics use pipelines.

WRITTEN BY Sridhar Andavarapu

Sridhar works as a Research Associate at CloudThat. He is highly skilled in both frontend and backend with good practical knowledge of various skills like Python, Azure Services, AWS Services, and ReactJS. Sridhar is interested in sharing his knowledge with others for improving their skills too.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!