Table of Contents
ADF Interview Questions
1.Why do we need Azure Data Factory?
Azure Data factory does not store any data itself; it lets you create workflows that orchestrate the movement of data between supported data stores and data processing. You can monitor and manage your workflows using both programmatic and UI mechanisms. Apart from that, it is the best tool available today for ETL processes with an easy-to-use interface. This shows the need for Azure Data Factory.
2.What is Azure Data Factory?
Azure Data Factory is a cloud-based integration service offered by Microsoft that lets you create data-driven workflows for orchestrating and automating data movement and data transformation overcloud. Data Factory services also offer to create and running data pipelines that move and transform data and then run the pipeline on a specified schedule.
3.What is Integration Runtime?
- Integration runtime is nothing but a compute infrastructure used by Azure Data Factory to provide integration capabilities across different network environments.
Types of Integration Runtimes:
- Azure Integration Runtime – It can copy data between cloud data stores and dispatch the activity to a variety of computing services such as SQL Server, Azure HD Insight.
- Self Hosted Integration Runtime – It is software with essentially the same code as Azure Integration runtime, but it is installed on on-premises systems or virtual machines over virtual networks.
- Azure SSIS Integration Runtime – It helps to execute SSIS packages in a managed environment. So when we lift and shift the SSIS packages to the data factory, we use Azure SSIS Integration Runtime.
4.How much is the limit on the number of integration runtimes?
There is no specific limit on the number of integration runtime instances. But there is a limit on the number of VM cores used by Integration runtime based on per subscription for SSIS package execution.
5.What is the difference between Azure Data Lake and Azure Data Warehouse?
Azure Data Lake | Data Warehouse |
Data Lake is a capable way of storing any type, size, and shape of data. | Data Warehouse acts as a repository for already filtered data from a specific resource. |
It is mainly used by Data Scientists. | It is more frequently used by Business Professionals. |
It is highly accessible with quicker updates. | It becomes a pretty rigid and costly task to make changes in Data Warehouse. |
It defines the schema after when the data is stored successfully. | Datawarehouse defines the schema before storing the data. |
It uses ELT (Extract, Load and Transform) process. | It uses ETL (Extract, Transform and Load) process. |
It is an ideal platform for doing in-depth analysis. | It is the best platform for operational users. |
6.What is Blob Storage in Azure?
It helps to store a large amount of unstructured data such as text, images or binary data. It can be used to expose data publicly to the world. Blob storage is most commonly used for streaming audios or videos, storing data for backup, and disaster recovery, storing data for analysis etc. You can also create Data Lakes using blob storage to perform analytics.
7.Difference between Data Lake Storage and Blob Storage
Data Lake Storage | Blob Storage |
It is an optimized storage solution for big data analytics workloads. | Blob Storage is general-purpose storage for a wide variety of scenarios. It can also do Big Data Analytics. |
It follows a hierarchical file system. | It follows an object store with a flat namespace. |
In Data Lake Storage, data is stored as files inside folders. | Blob storage lets you create a storage account. Storage account has containers that store the data. |
It can be used to store Batch, interactive, stream analytics, and machine learning data. | We can use it to store text files, binary data, media storage for streaming and general purpose data. |
8.What are the steps to create an ETL process in Azure Data Factory
- There are straightforward steps to create an ETL process.
- We need to create a service for a linked data store which is an SQL Server Database.
- Let’s assume that we have a car dataset. For this car’s dataset, we can create a linked service for the destination data store that is Azure Data Lake.
- Now create a data set for Data Saving.
- Create a Pipeline and Copy Activity.
- Finally, schedule a pipeline by adding a trigger.
9.What is the difference between Azure HDInsight and Azure Data Lake Analytics?
Azure HDInsight | Azure Data Lake Analytics |
It is a Platform as a Service. | It is a Software as a Service. |
Processing data in it requires configuring the cluster with predefined nodes. Further, by using languages like pig or hive, we can process the data. | It is all about passing the queries written for data processing. Data Lake Analytics further creates compute nodes to process the data set. |
Users can easily configure HDInsight Clusters at their convenience. Users can also use Spark, Kafka, without restrictions. | It does not give that much flexibility in terms of configuration and customization. But, Azure manages it automatically for its users. |
10.What are the top-level concepts of Azure Data Factory?
There are four basic top-level concepts of Azure Data Factory:
- Pipeline – It acts as a carrier where lots of processes take place.
- Activities – It represents the steps of processes in the pipeline.
- Data Sets – It is a data structure that holds our data.
- Linked Services– These services store information that is essential while connecting the resources or other services. Let’s say we have an SQL server, so we need a connecting string connected to an external device, and we will mention the source and the destination for it.
11.How can we schedule a pipeline?
The trigger follows a world clock calendar schedule that can schedule pipelines periodically or in calendar-based recurrent patterns. We can schedule a pipeline in two ways:
- Schedule Trigger
- Window Trigger
12.Can we pass parameters to a pipeline run?
Yes definitely, we can very easily pass parameters to a pipeline run. Pipeline runs are the first-class, top-level concepts in Azure Data Factory. We can define parameters at the pipeline level, and then we can pass the arguments to run a pipeline.
13.How do I access the data using the other 80 Dataset types in Data Factory?
The mapping data flow feature currently allows Azure SQL database, Data Warehouse, Delimited text-files from Azure Blob Storage or Azure Data Lake storage to generation tools natively for source and sink. You can use copy activity to states data from any of the other connectors, and then you can execute the data flow activity to transform data.
14.Explain the two levels of security in ADLS Gen2?
- Role-Based Access Control – It includes built-in azure rules such as reader, contributor, owner or customer roles. It is specified for two reasons. The first is, who can manage the service itself, and the second is, to permit the reasons is to permit the users built-in data explorer tools.
- Access Control List – Azure Data Lake Storage specifies precisely which data object users may read or write or execute.
15.What has changed from private preview to limited public preview regarding data flows?
There are a couple of things which have been changed mentioned below:
- You are no longer required to bring your own Azure Databricks Clusters.
- Data Factory will manage cluster creation and tear down process.
- We can still use Data Lake Storage Gen 2 and Blob Storage to store those files. You can use the appropriate linked services. You can also use the appropriate linked services for those of the storage engines.
- Blob data sets and Azure Data Lake storage gen 2 are separated into delimited text and Apache Parquet datasets.
16.What is the difference between the Dataset and Linked Service in Data Factory?
- Dataset: is a reference to the data store that is described by Linked Service.
- Linked Service: is nothing but a description of the connection string that is used to connect to the data stores.
17.What is the difference between the mapping data flow and wrangling data flow transformation?
- Mapping Data Flow: It is a visually designed data transformation activity that lets users design a graphical data transformation logic without needing an expert developer.
- Wrangling Data Flow: This is a code-free data preparation activity that integrates with Power Query Online.
18.Data Factory supports two types of compute environments to execute the transform activities. Mention them briefly.
Let’s go through the types:
- On-demand compute environment – It is a fully managed environment offered by ADF. In this compute type, a cluster is created to execute the transform activity and removed automatically when the activity is completed.
- Bring your own environment – In this environment, you yourself manage the compute environment with the help of ADF.
19.What is Azure SSIS Integration Runtime?
Azure SSIS Integration is a fully managed cluster of virtual machines that are hosted in Azure and dedicated to run SSIS packages in the data factory. We can easily scale up the SSIS nodes by configuring the node size or scaled out by configuring the number of nodes on the Virtual Machine’s cluster.
20.What is required to execute an SSIS package in Data Factory?
We need to create an SSIS Integration Runtime, and an SSIS Database catalogue hosted in the Azure SQL database or Azure SQL managed instance.
21.An Azure Data Factory Pipeline can be executed using three methods. Mention these methods.
Methods to execute Azure Data Factory Pipeline:
Debug Mode
Manual execution using trigger now
Adding schedule, tumbling window/event trigger
22.If we need to copy data from an on-premises SQL Server instance using a data factory, which integration runtime should be used?
Self-hosted integration runtime should be installed on the on-premises machine where the SQL Server Instance is hosted
23.An Azure Data Factory Pipeline can be executed using three methods. Mention these methods.
Methods to execute Azure Data Factory Pipeline:
- Debug Mode
- Manual execution using trigger now
- Adding schedule, tumbling window/event trigger
24.What is Azure Table Storage?
Azure Table Storage is a service that helps users to store structure data in the cloud and also provides a Keystore with schemas designed. It is swift and effective for modern-day applications.
25.Can we monitor and manage Azure Data Factory Pipelines?
Yes, we can monitor and manage ADF Pipelines using the following steps:
- Click on the monitor and manage on the data factory tab.
- Click on the resource manager.
- Here, you will find- pipelines, datasets, and linked services in a tree format.
26.What are the steps involved in the ETL process?
ETL (Extract, Transform, Load) process follows four main steps:
- Connect and Collect – helps in moving the data on-premises and cloud source data stores
- Transform – lets users collect the data by using compute services such as HDInsight Hadoop, Spark etc.
- Publish – Helps in loading the data into Azure data warehouse, Azure SQL database, and Azure Cosmos DB etc
- Monitor – It helps support the pipeline monitoring via Azure Monitor, API and PowerShell, Log Analytics, and health panels on the Azure Portal.
27.What do we understand by Integration Runtime?
Integration runtime is referred to as a compute infrastructure used by Azure Data Factory. It provides integration capabilities across various network environments.
- A quick look at the Types of Integration Runtimes:
- Azure Integration Runtime – Can copy data between cloud data stores and send activity to various computing services such as SQL Server, Azure HDInsight, etc.
- Self Hosted Integration Runtime – It’s basically software with the same code as the Azure Integration Runtime, but it’s installed on your local system or virtual machine over a virtual network.
- Azure SSIS Integration Runtime – It allows you to run SSIS packages in a managed environment. So when we lift and shift SSIS packages to the data factory, we use Azure SSIS Integration Runtime.
28.What is the difference between Azure Data Lake and Azure Data Warehouse?
Azure Data Lake | Data Warehouse |
Data Lakes are capable of storing data of any form, size, or shape. | A Data Warehouse is a store for data that has previously been filtered from a specific resource. |
Data Scientists are the ones who use it the most. | Business professionals are the ones who use it the most. |
It is easily accessible and receives frequent changes. | Changing the Data Warehouse becomes a very strict and costly task. |
When the data is correctly stored, it determines the schema. | Before storing the data, the data warehouse defines the schema. |
It employs the ELT (Extract, Load, and Transform) method. | It employs the ETL (Extract, Transform, and Load) method. |
It’s an excellent tool for conducting in-depth research. | It is the finest platform for operational users. |
29.Describe the process to create an ETL process in Azure Data Factory?
You can create an ETL process with a few steps.
- Create a service for linked data store i.e. SQL Server Database.
- Let’s consider you have a dataset for vehicles.
- Now for this dataset, you can create a linked service for the destination store i.e. Azure Data Lake.
- Then create a Data Set for Data Saving.
- The next step is to create a pipeline and copy activity. When you are done with creating a pipeline, schedule a pipeline with the use of an added trigger.
30.What are the top-level concepts of Azure Data Factory?
There are four basic top-level Azure Data Factory concepts:
- Pipeline – It acts as a transport service where many processes take place.
- Activities – It represents the stages of processes in the pipeline.
- Datasets – This is the data structure that holds our data.
- Linked Services – These services store information needed when connecting other resources or services. Let’s say we have a SQL server, so we need a connection string that is connected to an external device and we will mention its source and destination.
31.How can we schedule a pipeline?
We can schedule pipelines using a trigger. It follows a world clock calendar schedule. We can
schedule pipelines periodically or calendar-based recurrent patterns. Here are the two ways:
- Schedule Trigger
- Window Trigger
32.Is there any way to pass parameters to a pipeline run?
Yes absolutely, passing parameters to a pipeline run is a very easy task. Pipelines are known as the first-class, top-level concepts in Azure Data Factory. We can set parameters at the pipeline level and then we can pass the arguments to run a pipeline.
33. What is the difference between the mapping data flow and wrangling data flow transformation?
- Mapping Data Flow: This is a visually designed data conversion activity that allows users to design graphical data conversion logic without the need for experienced developers.
- Wrangling Data Flow: This is a code-free data preparation activity built into Power Query Online.
34.Is there any way to pass parameters to a pipeline run?
Yes absolutely, passing parameters to a pipeline run is a very easy task. Pipelines are known as the first-class, top-level concepts in Azure Data Factory. We can set parameters at the pipeline level and then we can pass the arguments to run a pipeline.
35.Explain the two levels of security in ADLS Gen2?
- Role-based Access Control – It includes built-in Azure rules such as reader, contributor, owner or customer roles. It is indicated for two reasons. The first is who can manage the service themselves, and the second is to provide users with built-in data mining tools.
- Access Control Lists – Azure Data Lake Storage specifies exactly which data objects users can read, write, or execute.
36.What has changed from private preview to limited public preview regarding data flows?
Some of the things that have changed are mentioned below:
- There is no need to bring your own Azure Data bricks clusters now.
- Data Factory will handle cluster creation and deletion.
- We can still use Data Lake Storage Gen 2 and Blob Storage to store these files. You may use the appropriate linked services. You may also use associated services that are appropriate for the services of the storage engines.
- Blob dataset and Azure Data Lake gen 2 storage split into delimited text and Apache Parquet dataset.
37.Data Factory supports two types of compute environments to execute the transform activities. What are those?
Let’s take a look at the types.
- On-Demand Computing Environment – This is a fully managed environment provided by ADF. This type of calculation creates a cluster to perform the transformation activity and automatically deletes it when the activity is complete.
- Bring your own environment – In this environment, use ADF to manage your computing environment yourself.
38.Can an activity output property be consumed in another activity?
Yes. An activity output can be consumed in a subsequent activity with the @activity construct
39.What is the way to access data by using the other 90 dataset types in Data Factory?
For source and sink, the mapping data flow feature supports Azure SQL Database, Azure Synapse Analytics, delimited text files from Azure Blob storage or Azure Data Lake Storage Gen2, and Parquet files from Blob storage or Data Lake Storage Gen2.
Use the Copy action to stage data from any of the other connectors, then use the Data Flow activity to transform the data once it’s staged. For example, your pipeline might copy data into Blob storage first, then transform it with a Data Flow activity that uses a dataset from the source.
40.Is it possible to calculate a value for a new column from the existing column from mapping in ADF?
In the mapping data flow, you can use derive transformation to generate a new column based on the logic you want. You can either create a new derived column or update an existing one when generating a derived column. Enter the name of the column you’re creating in the Column textbox.
The column dropdown can be used to override an existing column in your schema. Click the Enter expression textbox to start creating the derived column’s expression. You have the option of either inputting your expression or using the expression builder to create your logic.
41.What is the way to parameterize column name in dataflow?
We can pass parameters to columns similar to other properties. Like in derived column customer can use $ColumnNameParam = toString(by Name($myColumnNameParamInData)). These parameters can be passed from pipeline execution down to Data flows.
42.In what way we can write attributes in cosmos DB in the same order as specified in the sink in ADF data flow?
Because each document in Cosmos DB is stored as a JSON object, which is an unordered set of name/value pairs, the order cannot be guaranteed.
43.What are the Pipelines and Activities in Azure Data Factory?
A data factory consists of one or more pipelines. A Pipeline is a logical set of activities that together perform some action. For example, A pipeline can have activities which ingest the data into File Share and publish some event once the task is completed. A Pipeline allows you to manage the activities as a set instead of each one. A data factory allows you to deploy and schedule the pipeline instead of each activity individually.
Activity refers to some actions that you need to perform on data. For example, you can use copy activity to transfer data from Azure Blob Storage to Azure File Share. Data factory groups the activities into three categories as below.
- Data movement activities
- Data transformation activities
- Control activities
44.What are Datasets in ADF?
Dataset refers to that data that you are going to use in your pipeline activities as inputs and outputs. Dataset represents the structure of data within linked data stores such as files, folders, documents etc. For example, An Azure blob dataset defines the container and folder in blob storage from which a pipeline activity should read data as input to process. For more visit Datasets.
45.Explain Integration Runtime in the Data Factory?
Azure data factory uses a compute infrastructure known as Integration Runtime (IR), Which provides data integration capabilities over different network environments. These integration capabilities include:
- Data Flow
- Data Movement
- Activity Dispatch
- SSIS package execution
46.How does ADF Pipeline execution work?
A pipeline run can be defined as an instance of a pipeline execution. For example, you have a data factory pipeline to copy data from blob to file share that runs by event grid trigger. Each pipeline run has a unique id known as pipeline run id. This pipeline run id is a GUID that uniquely identifies each pipeline run. You can run a pipeline either by using some trigger or manually. For more
47.What are Triggers in a Data Factory?
Triggers represent an action or unit of processing which determines when a pipeline execution can be kicked off. Triggers are another way to execute a data factory pipeline. Triggers have many-to-many relationships with pipelines meaning – a single trigger can kick off multiple pipelines, or multiple triggers can kick off a single pipeline execution. Azure Data factory supports three types of triggers as below.
- Schedule trigger
- Tumbling window trigger
- Event-based trigger
48.What is azure data factory used for ?
Azure Data factory is the data orchestration service provided by the Microsoft Azure cloud. ADF is used for following use cases mainly :
- Data migration from one data source to other
- On Premise to cloud data migration
- ETL purpose
- Automated the data flow.
There is huge data laid out there and when you want to move the data from one location to another in automated way within the cloud or from on-premises to the azure cloud azure data factory is the best service we have available.
49.What is the pipeline in the adf ?
Pipeline is the set of the activities specified to run in defined sequence. For achieving any task in the azure data factory we create a pipeline which contains the various types of activity as required for fulfilling the business purpose. Every pipeline must have a valid name and optional list of parameters.
50.What is the data source in the azure data factory ?
It is the source or destination system which contains the data to be used or operate upon. Data could be of any type like text, binary, json, csv type files or may be audio, video, image files, or may be a proper database. Data source examples are : azure blob storage , azure data lake storage, any database like azure sgl database , mysql db postages , and etc.
There are 80+ different data source connector provided by the azure data factory to get in/out data from the data source.
51.What is the integration runtime in Azure data factory :
It is the powerhouse of the azure data pipeline. Integration runtime is also knows as IR, is the one who provides the computer resources for the data transfer activities and for dispatching the data transfer activities in azure data factory. Integration runtime is the heart of the azure data factory.
in Azure data factory the pipeline is made up of activities. An activity is represents some action that need to be performed. This action could be a data transfer which acquired some execution or it will be dispatch action. Integration runtime provides the area where this activity can execute.
52.What is Azure Integration Runtime ?
As the name suggested azure integration runtime is the runtime which is managed by the azure itself. Azure IR represents the infrastructure which is installed, configured, managed and maintained by the azure itself. Now as the infrastructure is managed by the azure it can’t be used to connect to your on premise data sources. Whenever you create the data factory account and create any linked services you will get one IR by default and this is called
AutoResolveIntegrationRuntime.
When you create the azure data factory you mentioned the rigion along with it. This region specifies where the meta data of the azure data factory would be saved. This is irrespective of the which data source and from which region you are accessing.
For example if you have created the adf account in the US East and you have data source in US West region, then still it is completely ok and data transfer would be possible.
53.What are the different types of integration runtime ?
There are 3 types of the integration runtime available in the Azure data factory. We can choose based upon our requirement the specific integration runtime best fitted in specific scenario. The three types are :
- Azure IR
- Self-hosted
- Azure-SSIS
54.What is use of lookup activity in azure data factory ?
lookup activity in adf pipeline is generally used for configuration lookup purpose. It has the source dataset. Lookup activity used to pull the data from source dataset and keep it as the output of the activity. Output of the lookup activity generally used further in the pipeline for making some decision, configuration accordingly. lookup activity
You can say that lookup activity in adf pipelines is just for fetching the data. How you will use this data will totally depends on your pipeline logic. You can fetch first row only or you can fetch the entire rows based on your query or dataset.
Example of the lookup activity could be : Lets assume we want to run a pipeline for incremental data load. We want to have copy activity which will pull the data from source system based on the last fetched date. Now the last fetched date we are saving inside the HighWaterMark.txt file. Here lookup activity will read the HighWaterMark.txt data and then based on the date copy activity will fetch the data.
55.What is the main advantage of the Auto Resolve Integration Runtime ?
Advantage of AutoResolveIntegrationRuntime is that it will automatically try to run the activities in the same region if possible or close to the region of the sink data source. This can improve the performance a lot.
56.What is copy activity in azure data factory
Copy activity is one of the most popular and highly used activity in the azure data factory.
copy activity is basically used for ETL purpose or lift and shift where you want to move the data from one data source to the other data source. While you copy the data you can also do the transformation for example you read the data from csv file which contains 10 columns however while writing to your target data source you want to keep only 5 columns. You can transform it and you can send only the required number of columns to the the destination data source.
For creating the copy activity you need to have your source and destination ready. Here destination is called as sink. Copy activity requires:
- Linked service
- Datasets
Assume you already have a linked service and data service created in case not you can please refer these links to create link service and datasets.
57.What do you mean by variables in the azure data factory ?
Variables is the adf pipeline provide the functionality to temporary hold the values. They are used for similar reason like we do use variables in the programming language. They are available inside the pipeline and there is set inside the pipeline. Set Variable and append variable are two type of activities used for the setting or manipulating the variables values. There are two types of the variable :
- System variable
- User Varibles
System variable: These are some kind of the fixed variable from the azure pipeline itself. For example pipeline name, pipeline id, trigger name etc. You mostly need this to get the system information which might be needed in your use case.
User variable : User variable is something which you declared manually based on your logic of the pipeline.
58.What is the linked service in the azure data factory ?
Linked service in azure data factory are basically the connection mechanism to connect to the external source. It works as the connection string to hold the user authentication information.
For example you want to connect to copy the data from the azure blob storage to azure sql server. In this case you need to build the 2 Linked service. One which is connect to blob storage and other to connect to azure sql database.
- Using Azure Portal
- ARM template way
59.What is the Dataset in the adf ?
In azure data factory as we create the data pipelines for ETL / Shift and load / Analytics purpose we need to create the dataset. Dataset connects to the data source via linked service. It is created based upon the type of the data and data source you want to connect. Dataset resembles the type of the data holds by data source.
For example if we want to pull the csv file from the azure blob storage in the copy activity, the dataset for it. Linked service will be used to make connection to the azure blob storage and dataset would hold the csv type data.
60.Can we debug the pipeline ?
Debugging is one of the key feature for any developer. To solve and test issue in the code developers uses the debug feature in general. Azure data factory also provide the debugging feature. In this tutorial I will take you through each and every minute details which would help you to understand the debug azure data factory pipeline feature and how you can utilize the same in your day to day work
When you go to the data pipeline tab there on the top you can see the ‘Debug’ link to click. Like this :
61.What are the steps invovled in ETL process?
ETL process generally involves four steps:
- Connect & Collect – Helps in moving the data from on-premises and cloud source data stores.
- Transform – Helps in collecting the data by using compute services such as HDInsight Hadoop, Spark, etc.
- Publish – Helps in loading the data into Azure Data Warehouse, Azure SQL Database, and Azure Cosmos DB, etc.
- Monitor – Helps in supporting the pipeline monitoring via Azure Monitor, API, PowerShell, Log Analytics, and health panels on the Azure portal.
62.What is Cloud Computing?
Cloud computing allows all businesses and individuals for comsuming computing resources such as virtual machines, databases, processing, memory, services, storage, or event number of calls or events and pay-as-you-go . It is culmination of numerous attempts at a large scale computing with seamless access.
Cloud Computing is scalable and reliable as there is no limit on the number of users of resources . As it increases processing and resources.
63.What are the advantages of Cloud Computing?
. Scalability
. Agility
. High Availability
. Latency
. Moving from Capex to Opex
. Fault Tolerance
64.Can we Monitor and manage Azure Data Factory pipelines?
Here are the steps to follow:
We need to click on Monitor & Manage on the Data Factory tab.
Secondly we have to click on Resource Explorer.
We will find – pipelines, datasets, linked services in a tree format.
65.What is Azure Table Storage?
Azure Table Storage is a service used across many projects which helps us to store structured data in the cloud and also provides a key store with a schemaless design.It is fast and cost effective for many applications.
Table storage can store flexible datasets like user data for a web application or any other device information or any other types of metadata which your service requires.We can store any number of entities in the table.
- It helps to store TBs of structured data.
- For storing datasets that don’t require complex joins, foreign keys, or stored procedures.
- Quickly querying data using a clustered index.
66.What are Azure Storage Types?
- Blobs
- Tables
- Files
- Queues
67.Difference between mapping and wrangling data flows?
Mapping
It provides ways to transform data at scale without coding.
Data flow is great.
Helps in transforming data with both known and unknown schemas in the sinks and sources.
Wrangling
It allows us to do agile data preparation using Power Query Online mashup editor at scale via spark execution.
Data flow is less formal.
Helps in model based analytics scenarios.
68.Limit on the number of integration runtimes?
An Azure subscription can have one or more Azure Data Factory instances.
69.What is Azure Active Directory and what is its purpose?
Azure Active directory is a comprehensive identity and access management Cloud solution; it combines directory services, advanced identity governance, application access management and a rich standards-based platform for you.
As you know, Windows Azure Active Directory is a multi-tenant Service, that provides an enterprise level identity and access management for the Cloud, built to support global scale, reliability and availability.
Some points are as For Azure Active Directory, you must have a Microsoft Account.
- Afterwards, I will create a new Windows Azure Active Directory.
- Subsequently, I’ll add the users to the directory as either a user or a global admin.
- The next step will be optionally enabling multi-factor authentication for the user.
- Afterwards, I’ll optionally add the user as a co-administrator for the subscription.
- follows about Windows Azure Active Directory, which are:
70.What is Azure Redis Cache and how to implement it?
Azure Redis Cache is a managed version of the popular open source version of Redis Cache which makes it easy for you to add Redis into your applications that are running in Azure. Redis is an in-memory database where data is stored as a key-value pair so the keys can contain data structures like strings, hashes, and lists. You can cache information in Redis and can easily read it out because it is easier to work with memory than it is to go from the disk and talk to a SQL Server.
- Suppose, we have a web server where your web application is running. The back-end has SQL Server implementation where the SQL Server is running on a VM or maybe it is an Azure SQL database.
- A user comes to your application and they go to a page that has tons of products on it.
- Now, that page has to go to the database to retrieve the information and then that gets sent back to the web server and gets delivered to the user. But if you have thousands of users hitting that web page and you are constantly hitting the database server, it gets very inefficient.
- The solution to this is to add Azure Redis Cache and we can cache all of those read operations that are taking place. So, that goes to an in-memory database on the Azure Redis Cache.
- When other users come back and look for the same information on the web app, it gets retrieved right out of the Azure Redis Cache very quickly and hence we take the pressure of the back-end database server.
While deploying Azure Redis Cache, we can deploy it with a single node, we can deploy it in a different pricing tier with a two node implementation and we can also build an entire cluster with multiple nodes.
Learn more about Azure Redis Cache here: Introduction to Azure Redis Cache.
71.How to create and connect to Azure SQL Database?
First, we need to log into the Azure Portal with our Azure credentials. Then we need to create an Azure SQL database in the Azure portal.
Click on “Create a resource” on the left side menu and it will open an “Azure Marketplace”. There, we can see the list of services. Click “Databases” then click on the “SQL Database”.
- Create a SQL database
After clicking the “SQL Database”, it will open another section. There, we need to provide the basic information about our database like Database name, Storage Space, Server name, etc.
72.What is Azure SQL Data Warehouse?
The definition given by the dictionary is “a large store of data accumulated from a wide range of sources within a company and used to guide management decisions”. As per the definition, these warehouses allow collecting the data from the various databases located as remote or distributed systems. It can be built by the integration of the data from the multiple sources that can be used for analytical reporting, decision making etc. SQL Data Warehouse is a cloud-based Enterprise application that allows us to work under parallel processing to quickly analyze a complex query from the huge volume of data. It is also a solution for the Big-Data concepts.
73.Can you create a Virtual Machine in Azure?
In this article, we will explore what a Virtual Machine is, along with the step by step implementation and the ways of connecting VMs to our local system.
Virtual Machines Service in Azure provides a highly flexible “compute on demand” option for running our application workloads. The Azure portal provides a large collection of templates from which we can get started with our desired server and operating system. We can create multiple virtual machines and group them together in Azure cloud services. Cloud services serve as a network and security boundary for virtual machines. By placing virtual machines in cloud service, we can create multiple instances of any tier of our application. For example, we host our web application on three virtual machines having the same server operating system and place those virtual machines in an availability set so that at least one virtual machine will be available at all the times. Virtual machines use the Hyper-V virtual hard disk format (.vhd) for their hard drives. We can simply upload the fixed-size virtual hard disk files from our infrastructure to Azure and also download the hard disk files from Azure to our data center.
74.What is CosmosDB?
Azure Cosmos DB is globally replicated, multimodal database service that offers rich querying over schema-free data.
The definition of Cosmos DB says ‘Globally Replicated’ which means, you can replicate your database in different geographical areas. It stores data in JSON format and there is no need to define the schema in advance hence it is schema free. You can execute the SQL query on stored JSON documents. Cosmos DB was formerly known as Document DB and it supports multimodal like SQL, Table API, Graph API etc.
Azure Cosmos DB is the right solution for web, mobile, gaming applications when predictable throughput, high availability and low latency are key requirements. We will cover throughput, availability, latency in detail in upcoming articles.
Here is a complete series on Cosmos DB: Introduction to Azure Cosmos DB
75.What is Blob?
Blob is a service for storing large amounts of unstructured data that can be accessed from anywhere in the world via HTTP or HTTPS.” Blob stands for ” Binary Large Object “. It’s designed to store large amounts of unstructured text or binary data like virtual hard disks, videos, images or even log files.
The data can be exposed to the public or stored privately. It scales up or down as your needs change. We no longer manage it, we only pay for what we use.
76.Why do we use Blob?
- Store any type of unstructured data that includes images, videos, audio, documents and backups at exabyte scale. It handles trillions of stored objects, with millions of average requests per second, for customers around the world.
- Blob has Strong Consistency. When an object is changed, it is verified everywhere for superior data integrity, ensuring you always have access to the latest version.
- We can have the flexibility to perform the edits in storage, which can improve your application performance and reduce bandwidth consumption.
- We have many different types of blobs for our flexibility. Automatically we configure geo-replication options in a single menu, to easily empower global and local access.
- One infrastructure but access world wide. With regions around the world, it is ideal for streaming and storing media, whether it is live broadcast events or long-term archive of petabytes of movies and television shows. We can perform secure backup and disaster recovery
77.What Is Azure Databricks?
Azure Databricks is a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. Designed in collaboration with the founders of Apache Spark, Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click setup; streamlined workflows and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
As an Azure service, customers automatically benefit from native integration with other Azure services such as Power BI, SQL Data Warehouse, Cosmos DB as well as from enterprise-grade Azure security, including Active Directory integration, compliance, and enterprise-grade SLAs.
Learn more here: What is Azure Databricks.
78.What is Azure Advisor?
The Azure Advisor service provides information about your entire Azure landscape. It gives you a complete overview of your system needs including possible ways to save money.
- High Availability shows you ways to improve the continuity of your business-critical applications.
- Security detects threats and vulnerabilities that might lead to security breaches.
- Performance shows you ways to speed up your applications.
- Cost gives you ways to reduce your overall Azure spreading.
79.What are Azure App Services?
We can develop an application in any language or framework, such as .NET, .NET Core, Java, Ruby, Node.js, PHP, or Python. Applications run and scale as per our need with complex architecture on both Windows and Linux-based environments. The App service has added the power of Azure to our application, such as security, load balancing, scaling, and automated management. We can also develop capabilities such as continuous deployment from GitHub, Docker Hub, Azure DevOps, and other sources, staging environments, SSL certificates, etc.
Azure App Service enables us to create web, mobile, logic and API apps very easily. We can run any number of these apps within a single Azure App service deployment. Our apps are automatically managed by Azure App Service and run in managed VMs isolated from other customers. We can also use built-in auto-scaling feature supported within Azure App Service that automatically increase and decrease the number of VMs based on consumption resources.
80.What Is Azure Kubernetes Service?
Kubernetes is an open source system started by Google to help orchestrate (deploy, scale and manage) containerized applications. Azure Kubernetes Service makes working with Kubernetes easier.
Before we learn how to orchestrate containers, let’s discuss a bit about containers.
You can run your applications as containers. Think about containers as isolated processes that have their own directory, user groups, ports, etc. which run in the host machine. They are like virtual machines, but not the same. Virtual machines have their own operating systems, whereas containers use the operating system of the host machine. Also, the containers run from images. Think of images as software installers. But images bundle up the software code with its dependencies, because of which containers run the same way on different environments since they are environment independent to a much larger extent.
81.I have some private servers on my premises, also I have distributed some of my workload on the public cloud, what is this architecture called?
- Virtual Private Network
- Private Cloud
- Virtual Private Cloud
- Hybrid Cloud
Hybrid Cloud Explanation:This type of architecture would be a hybrid cloud. Why? Because we are using both, the public cloud, and on premises servers I . e the private cloud. To make this hybrid architecture easy to use, wouldn’t it be better if your private and public cloud were all on the same network (virtually). This is established by including your public cloud servers in a virtual private cloud, and connecting virtual cloud with your on premise servers using a VPN (Virtual Private Network).
82.What is Microsoft Azure and why is it used?
As discussed above, the companies which provide the cloud service are called the Cloud Providers. There are a lot of cloud providers out there, out of them one is Microsoft Azure. It is used for accessing Microsoft’s infrastructure for cloud.
83.Which service in Azure is used to manage resources in Azure?
- Application Insights
- Azure Resource Manager
- Azure Portal
- Log Analytics
Azure Resource Manager
Explanation: Azure Resource Manager is used to “manage” infrastructures which involve a no. of azure services. It can be used to deploy, manage and delete all the resources together using a simple JSON script
84. What are Roles and why do we use them?
Roles are nothing servers in layman terms. These servers are managed, load balanced, Platform as a Service virtual machines that work together to achieve a common goal.
There are 3 types of roles in Microsoft Azure:
Web Role
Worker Role
VM Role
Let’s discuss each of these roles in detail:
Web Role – A web role is basically used to deploy a website, using languages supported by the IIS platform like, PHP, .NET etc. It is configured and customized to run web applications.
Worker Role – A worker role is more like an help to the Web role, it used to execute background processes unlike the Web Role which is used to deploy the website.
VM Role – The VM role is used by a user to schedule tasks and other windows services. This role can be used to customize the machines on which the web and worker role is running.
85. Is it possible to create a Virtual Machine using Azure Resource Manager in a Virtual Network that was created using classic deployment?
This is not supported. You cannot use Azure Resource Manager to deploy a virtual machine into a virtual network that was created using classic deployment.
86. Are data disks supported within scale sets?
Yes. A scale set can define an attached data disk configuration that applies to all VMs in the set. Other options for storing data include:
- Azure files (SMB shared drives)
- OS drive
- Temp drive (local, not backed by Azure Storage)
- Azure data service (for example, Azure tables, Azure blobs)
- External data service (for example, remote database)
87 .What is an Availability Set?
An availability set is a logical grouping of VMs that allows Azure to understand how your application is built to provide redundancy and availability. It is recommended that two or more VMs are created within an availability set to provide for a highly available application and to meet the 99.95% Azure SLA. When a single VM is used with Azure Premium Storage, the Azure SLA applies for unplanned maintenance events.
88.What are Fault Domains?
A fault domain is a logical group of underlying hardware that share a common power source and network switch, similar to a rack within an on-premise data-centers. As you create VMs within an availability set, the Azure platform automatically distributes your VMs across these fault domains. This approach limits the impact of potential physical hardware failures, network outages, or power interruptions.
89.What are Update Domains?
An update domain is a logical group of underlying hardware that can undergo maintenance or can be rebooted at the same time. As you create VMs within an availability set, the Azure platform automatically distributes your VMs across these update domains. This approach ensures that at least one instance of your application always remains running as the Azure platform undergoes periodic maintenance. The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only one update domain is rebooted at a time.
90.What is a cloud environment?
Cloud environment is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and provided with least management effort or provider interaction. This cloud model promotes availability.
91.Why do we need Cloud?
On-demand – Resources should be always available when you need them, and you have control over turning them on or off to ensure there’s no lack of resource or wastage..
Scalable – You ought to be able to scale (increase or decrease the resource) once necessary. The cloud providers should have enough capacity to meet customer’s needs.
Multi-tenant – generally you’ll be sharing a similar resource (e.g.hardware) with another tenant.
But after all, this is often clear to the client. Cloud suppliers shall accountable the safety side, guaranteeing that one tenant won’t be able to access another’s knowledge.
Self-service computation and storage resource – connected processes including: asking, resource provisioning, and deployment should be self-service and automated, involving much less manual processing. If a machine wherever our service is hosted fails, the cloud supplier ought to be able to failover our service right away.
Reliability – Cloud supplier ought to be able to give client responsibility service, committing to uptimes of their service .Utility-based subscription – you may pay the cloud supplier as a utility-based subscription, rather like paying your electricity bill – with none direct investment.
92.What are Resource Groups?
A resource cluster contains the resources needed to with success deploy a VM in Azure.
It is an instrumentality that holds connected resources for associate degree Azure answers.
In Azure, you logically cluster connected resources like storage accounts, virtual networks, and virtual machines (VMs) to deploy, manage, and maintain them as one entity.
93.What are the categories of services provided in the Cloud?
IAAS – Infrastructure as a Service
IaaS offers you a server within the cloud (virtual machine) that you simply have complete management over.
With Associate in Nursing Azure VM, you’re to blame for managing everything from the software package on up to the appliance you’re running.
PAAS – Platform as a Service
An Azure Cloud Service consists of 2 components: your application files (source code, DLLs, etc.) and a configuration file.
Together, these 2 parts can spin up a mix of net Roles and employee Roles to execute your application.
With Cloud Services, Azure handles all of the tedious software package details for you, therefore you’ll be able to specialise in what matters – building a top-quality application for your users.
SAAS – Software as a Service
Software as a Service applications are built and hosted through 3rd party vendors who typically charge for a certain level of service – $30/month for X projects and Y users.
94.Can you name some scenarios where VMs are preferred?
- Development And check – Azure VMs provide a fast and straightforward thanks to produce a laptop with specific configurations needed to code and check an application.
- Applications within the cloud – as a result of demand for your application will fluctuate, it would create economic sense to run it on a VM in Azure. You get hold of further VMs after you would like them and shut them down after you don’t.
- Extended datacenter – Virtual machines in AN Azure virtual network will simply be connected to your organization’s network.
- With Unmanaged disks, you create your own storage account and specify that storage account when you create the disk.
95.What are the types of Disk used by VMs?
- Operating system disk
- Temporary disk
- Data disk
96.How are Virtual Hard Disks stored in Azure and what type of storage is recommended for VHDs?
The VHDs used in Azure are .vhd files stored as page blobs in a standard or premium storage account in Azure.
- Standard HDD disks
- Standard SSD disks
- Premium SSD disks
- Ultra Premium disks
97.Name some roles & features not supported in Azure VM?
- Dynamic Host Configuration Protocol Server
- Hyper-V (Hyper-V role is supported in Azure Ev3, and Dv3 series VMs only)
- BitLocker Drive Encryption (on the operating system hard disk, may be used on data disks)
- Network Load Balancing
- Wireless LAN Service
98.What Is The Difference Between Block Blob Vs Page Blob?
- Block blobs are blocks, each of which is identified by a block ID.
- You create or modify a block blob by uploading a set of blocks and committing them by their block IDs.
- If you are uploading a block blob that is no more than 64 MB in size, you can also upload it in its entirety with a single Put Blob operation. -Each block can be a maximum of 4 MB in size. The maximum size for a block blob in version 2009-09-19 is 200 GB, or up to 50,000 blocks.
- Page blobs are a collection of pages. A page is a range of data that is identified by its offset from the start of the blob. To create a page blob, you initialize the page blob by calling Put Blob and specifying its maximum size.
- The maximum size for a page blob is 1 TB. A page written to a page blob may be up to 1 TB in size.
99.What are the top-level concepts of Azure Data Factory?
- Pipeline: It acts as a carrier in which we have various processes taking place.
This individual process is an activity.
- Activities: Activities represent the processing steps in a pipeline. A pipeline can have one or multiple activities. It can be anything I . e process like querying a data set or moving the dataset from one source to another.
- Datasets: Sources of data. In simple words, it is a data structure that holds our data.
- Linked services: These store information that is very important when it comes to connecting an external source.
100.Explain the two levels of security in ADLS Gen2?
The two levels of security applicable to ADLS Gen2 were also in effect for ADLS Gen1. Even though this is not new, it is worth calling out the two levels of security because it’s a very fundamental piece to getting started with the data lake and it is confusing for many people just getting started.
- Role-Based Access Control (RBAC). RBAC includes built-in Azure roles such as reader, contributor, owner or custom roles. Typically, RBAC is assigned for two reasons. One is to specify who can manage the service itself (i.e., update settings and properties for the storage account). Another reason is to permit the use of built-in data explorer tools, which require reader permissions.
- Access Control Lists (ACLs). Access control lists specify exactly which data objects a user may read, write, or execute (execute is required to browse the directory structure). ACLs are POSIX-compliant, thus familiar to those with a Unix or Linux background.
101.How many databases can we create in a single server
150 databases (including master database) can be created in a single SQL Azure server.
102.How many servers can we create in a single subscription?
We can create six(6) servers under a single subscription.
103.What are the benefits of the traffic manager in Windows Azure?
- Traffic manager is allocated to control the distribution of the user to deploy the cloud service. The benefit of the traffic manager constitutes;
- It makes the application available worldwide through automated traffic control machinery.
- The traffic managing service contributes to high performance by loading the page faster and convenient usage.
- There is no lag of time to maintain or upgrade the existing system. The system keeps running in the back while the system takes time for up gradation.
- The configuration is made easy through the Azure portal.
104.Discuss the different database types in SQL Azure?
This is one of the commonly asked SQL Azure interview questions that must be answered by stating that there are two major type of database in SQL Azure;
- Web Edition – It is having a limit of 5GB SQL that is related to the relational database. The basic advantage is that they can be self-maintained, tolerant to a fault and highly available.
- Business-based Edition – they support 50GB of T-SQL that is self-managed, tolerant to a fault and highly available. They are suited for the custom web applications or ISV application.
105.How is Azure Resource Manager beneficial over the classic services?
The benefits of the Azure Resource Manager that overshadows the benefit of the classic services are;
The resources need not be managed, deployed or monitored one at a time. They are chain deployment activities throughout the lifecycle without the need for individual data handling.
The data is also deployed at a consistent pace with the ARM service. It enables the user to use a declarative template that indicates the deployment.
Since the role-based control is present in the management platform that provides you with the access to the resources that leads you to control.
You can mark dependencies between the resources that enable you to get the correct order of deployment.
The resources may be tagged and organized logically so that it is convenient to follow up the billing of your company.
106.Enlist the monitoring features that are present in the SharePoint 2010?
This is one of the SQL Azure interview questions that should be answered by stating that the SharePoint 2010 is a diagnostic logging that takes into the data that are a direct indication of the state of the system. Sometimes it also specifies some timer tasks that are performed to monitor the collected information. The features include;
- It collects the data on event log, timer service, and performance counter.
- They are involved with data that are search usage.
- They provide matrices that are yielding a collection of sites.
107.Comparea the STS and SPS and state its important features?
SPS is the SharePoint Portal Service which manages the documents and has a search engine more efficient in penetrating the multiple sources of content.
STS stands for SharePoint team management. As the name suggests they are better for document management for a large organization and have a moderate search engine.
108.Explain MOSS?
The answer to this SQL Microsoft Azure interview question for experienced will be MOSS is abbreviated for Microsoft Office SharePoint Server that constitutes a complete version of the portal platform that yields the user to manage, share and even create the document.
109.What do you mean by SAS?
This is one of the common SQL Azure interview questions that should be answered by stating that SAS is an abbreviation for Statistical analytical System which is a software suite performing analysis of multiple variables. It is linked to predictive analysis, data handling, advanced analytics or corporate intelligence. It produces a smooth interface that offers graphical and clicks based solutions. It is user-friendly for the technical or the non-technical with advanced features.
110.State some features of SAS?
There is some interlinked feature of SAS which includes;
- It provides the user access to manage the resources just like that of the DBMS.
- It is leading with the leading analytical to carry out different business services and products.
- It enables easy visualization and interpretation with the use of graphs and its breaks the complex panels into simple plots.
- It is very efficient in delivering the business analysis which leads to manufacturing the products that can be distributed worldwide.
111.Describe the common architecture of SharePoint 2010?
There are three main architectural design of the SharePoint 2010 which includes;
- The enterprise farm which is uncommon as it completely is dedicated to the service and aids via the automated management with the feasible isolation of data.
- There is a single farm that is associated with multiple services whose potential advantages are managed via individual service application which enables a more complex targeting of sites to a particular application of service.
- Lastly, the single farms employed in single service are very common and at the same time easy to deploy. The application service is simple to be allocated with fullest resource utilization and management.
112.Describe the log analytics?
This question can be asked among the SQL Azure interview questions. The operational management service of Log Analytics provides the entire requirement that runs the particular service. It manifests automation, security, log analytics and availability at a particular dashboard. It generates Power data source that enables the user to get the visuals of the raw data. It is introduced in three different tiers of prices that include free, premium and standard. You enjoy the convenience of searching the data at a single dashboard and exporting the results.
113.If the client gets disconnected from cache with the services state the probable cause?
If the client gets disconnected the causal factor can be distributed into two categories;
The cause on the operator side:
- There might be a failure in the transfer of the standard cache from one node to the other.
- While the service was processing and dispatching the cache got deployed.
- There was a server update or an automated VM maintenance.
The fault on the client side;
- The application of the client accidentally got redeployed.
- The application on the client side got auto-scaling.
- The layer of the network on the client side altered.
- There was a transient error on the network node.
- The bound operation took more time.
- The upper limit of the bandwidth was reached.
114.Explain how a character analytics API functions?
The working of the character analytics of API does not account for the characterization of words like good or bad. It uses the advanced feature through which the processing of natural language is mediated.
115. What is difference between IaaS, PaaS, and SaaS?
Iaas, PaaS, and SaaS are three major components of Azure and cloud computing.
- Infrastructure as a Service (IaaS)
-
- With IaaS, you rent IT infrastructure – servers and virtual machines (VMs), storage, networks, operating systems – from a cloud provider on a pay-as-you-go basis.
- Platform as a Service (PaaS)
-
- Platform as a service (PaaS) refers to cloud computing services that supply an on-demand environment for developing, testing, delivering and managing software applications.
- Software as a Service (SaaS)
- Software as a service (SaaS) is a method for delivering software applications over the Internet, on demand and typically on a subscription basis. With SaaS, cloud providers host and manage the software application and underlying infrastructure and handle any maintenance, such as software upgrades and security patching.
- Learn more here: Introduction to Cloud Computing.
116.Why do we need Azure Data Factory?
As the world moves to the cloud and big data, data integration and migration remain an integral part of enterprises in all industries. ADF helps solve both of these problems efficiently by focusing on the data and planning, monitoring, and managing the ETL / ELT pipeline in a single view.
The reasons for the growing adoption of Azure Data Factory are:
- Increased value
- Improved results of business processes
- Reduced overhead costs
- Improved decision making
- Increased business process agility
117.What is Azure?
Windows Azure is a cloud platform developed by Microsoft that enables businesses to completely run in the cloud. Cloud platform offers pay-as-you-go payment method to pay for the use of cloud services. For example, if you’ve a database and you don’t use it, you don’t pay.
Azure offers every service in the cloud a company needs to run its business. It provides infrastructure, hardware, operating systems, document services, storage, databases, da
ta services, software, third party products, and anything you can imagine.
You can also host your own virtual machines, web servers, database servers, and content storage in Azure. Azure is not only Windows but also supports Linux servers.
Learn more about Azure, continue reading: Introduction to Microsoft Azure.
118.Is Azure data Factory an ETL tool?
Azure Data Factory is a cloud-based ETL and data integration service for creating data movement and transformation workflows. Data Factory allows you to design scheduled workflows (pipelines) without writing any code.
119.What is the use of Azure Data Lake in Azure data Factory (ADF)?
Azure Data Lake Storage Gen2 is a set of features integrated into Azure Blob storage that are focused on big data analytics. It enables interaction with data through both the file system and the object storage models. Azure Data Factory (ADF) is a cloud-based data integration service that is fully managed.
120.What is parameterization in Azure?
It allows us to provide the server name, database name, credentials, and so on dynamically while executing the pipeline, allowing us to reuse rather than building one for each request. Parameterization in Azure Data Factory is important to successful design and reusability, as well as reduced solution maintenance costs.
121.What is Global parameters in Azure Data Factory?
Global parameters are constants which can be used by a pipeline in any expression throughout a data factory. It comes in handy when we have several pipelines with the same parameter names and values. We can override these parameters in each environment when promoting a data factory using the continuous integration and deployment process (CI/CD).
Global parameters are constants which can be used by a pipeline in any expression throughout a data factory. It comes in handy when we have several pipelines with the same parameter names and values. We can override these parameters in each environment when promoting a data factory using the continuous integration and deployment process (CI/CD).
Global parameters are constants which can be used by a pipeline in any expression throughout a data factory. It comes in handy when we have several pipelines with the same parameter names and values. We can override these parameters in each environment when promoting a data factory using the continuous integration and deployment process (CI/CD)
122.What is the use of dataset in Azure data Factory?
A dataset is a named view of data that clearly points to or references the data we could use as inputs and outputs in activities. Datasets are used to identify data in various data storage, such as tables, files, folders, and documents.
123.What is Copy activity in Azure Data Factory?
The Copy activity in Azure Data Factory can be used to copy data between on-premises and cloud data stores (supported data stores and formats) and to use the copied data in additional transformation or analysis tasks.
124.What are the types of Integration Runtimes in Azure Data Factory?
Integration Runtimes in Azure Data Factory are classified into three types
1.The Azure Integration Runtime is utilised when copying data between data stores that are publicly accessible via the internet.
2.Self-Hosted Integration Runtime for copying data from or to an on-premises data store or from a network with access control.
3.SSIS packages in the Data Factory are run using the Azure SSIS Integration Runtime.
125.How to copy data from an Azure Blob Storage (text file) to an Azure SQL Database table?
Set up the Data Factory pipeline which will be used to copy data from the blob storage to the Azure SQL Database.
- Navigate to the Azure Portal and select the Author & Monitor option.
- Click on the Create Pipeline / Copy Data option.
- Enter a unique name for the copy activity and select whether to schedule or execute it once for the Copy Data option -> Click Next
- Select the type of source data store to link to in order to construct a Linked Service (Azure Blob Storage) and click Continue
- A new Linked Service window will open -> enter a unique name and other details -> test the connection and then click Create.
- If you need to copy many files recursively, specify the input data file or folder (DataSet) and click Next.
- Select Azure SQL Database from the list of New Linked Services, then click Continue to configure the new Linked service.
- Fill out all of the information in the New Linked Service window -> Test Connection -> Create
- Create the sink dataset with the destination database table specified.
- Copy Data Tool -> Settings -> Next -> Review all the copy configuration from the Summary Window -> Next -> Finish
126. Briefly explain different components of Azure Data Factory:
Pipeline: It represents activities logical container.
Dataset: It is a pointer to the data utilized in the pipeline activities
Mapping Data Flow: Represents a data transformation UI logic
Activity: In the Data Factory pipeline, Activity is the execution step that can be utilized for data consumption and transformation.
Trigger: Mentions the time of pipeline execution.
Linked Service: It represents an explanatory connection string for those data sources being used in the pipeline activities
Control flow: Regulates the execution flow of the pipeline activities
127.What is the need for Azure Data Factory?
While going through azure tutorial, you would come across this terminology. Since data comes from different sources, it can be in any form. Such varied sources will transfer or channelize the particular data in various ways and the same can be in a varied format. Whenever we convey this data on the cloud or specific storage, it is inevitable to ascertain that this data is efficiently managed. So, you have to transform the data and remove unnecessary parts.
128.Is there any limit on how many integration runtimes can be performed?
No, there is no limit on the number of integration runtime occurrences you can have in an Azure data factory. However, there is a limit on the number of VM cores that the integration runtime can utilize for every subscription for SSIS package implementation. When you pursue Microsoft Azure Certification, you should be aware of these terms.
129.Explain Data Factory Integration Runtime?
Integration Runtime is a safe computing infrastructure being used by Data Factory for offering data integration abilities over various network environments. Moreover, it ascertains that such activities will get implemented in the nearest possible area to the data store. If you want to Learn Azure Step by step, you must be aware of this and other such fundamental Azure terminologies.
130.Mention about three types of triggers that Azure Data Factory supports?
The Schedule trigger is useful for the execution of the ADF pipeline on a wall-clock timetable.
The Tumbling window trigger is useful for the execution of the ADF pipeline over a cyclic interval. It holds on to the pipeline state.
The Event-based trigger responds to an event that is related to blob. Examples of such events include adding or deleting a blob from your Azure storage account.
131. What it means by blob storage in Azure?
Blob storage in Azure is one of the key aspects to learn if you want to get Azure fundamentals certification. Azure Blob Storage is a service very useful for the storage of massive amounts of unstructured object data like binary data or text. Moreover, you can use Blob Storage to render data to the world or for saving application data confidentially. Typical usages of Blob Storage include:
- Directly serving images or documents to a browser
- Storage of files for distributed access
- Streaming audio and video
- Storing data for backup and reinstate disaster recovery, and archiving
- Storing data for investigation by an on-premises or any Azure-hosted service
132.Mentions the steps for creating ETL process in Azure Data Factory?
When attempting to retrieve some data from Azure SQL server database, if anything needs to be processed, it would be processed and saved in the Data Lake Store. Here are the steps for creating ETL:
- Firstly, create a Linked Service for source data store i.e. SQL Server Database
- Suppose that we are using a cars dataset
- Now create a Linked Service for a destination data store that is Azure Data Lake Store
- After that, create a dataset for Data Saving
- Setup the pipeline and add copy activity
- Finally, schedule the pipeline by inserting a trigger
133.Mention about three types of triggers that Azure Data Factory supports?
The Schedule trigger is useful for the execution of the ADF pipeline on a wall-clock timetable.
The Tumbling window trigger is useful for the execution of the ADF pipeline over a cyclic interval. It holds on to the pipeline state.
The Event-based trigger responds to an event that is related to blob. Examples of such events include adding or deleting a blob from your Azure storage account.
134.How to create Azure Functions?
Azure Functions are solutions for implementing small lines of functions or code in the cloud. With these functions, we can choose preferred programming languages. You need to pay only for the time the code runs which means that you need to pay per usage. It supports a wide range of programming languages including F#, C#, Node.js, Java, Python, or PHP. Also, it supports continuous deployment as well as integration. It is possible to develop serverless applications through Azure Functions applications. When you enroll for Azure Training In Hyderabad, you can thoroughly know how to create Azure Functions.
135.What are the steps to access data through the use of the other 80 dataset types in Data Factory?
Currently, the Mapping Data Flow functionality allows Azure SQL Data Warehouse, Azure SQL Database, defined text files from Azure Blob storage or Azure Data Lake Storage Gen2, and Parquet files from Blob storage or Data Lake Storage Gen2 natively for sink and source.
You need to use the Copy activity to point data from any of the supplementary connectors. Subsequently, you need to run a Data Flow activity to efficiently transform data after it is already staged.
136.What do you need for executing an SSIS package in Data Factory?
You have to create an SSIS IR and an SSISDB catalog which is hosted in Azure SQL Managed Instance or Azure SQL Database.
137.What are Datasets in ADF?
The dataset is the data that you would use in your pipeline activities in form of inputs and outputs. Generally, datasets signify the structure of data inside linked data stores like documents, files, folders, etc. For instance, an Azure blob dataset describes the folder and container in blob storage from which a specific pipeline activity must read data as input for processing.
138.What is the use of the ADF Service?
ADF is primarily used to organize the data copying between various relational and non-relational data sources that are being hosted locally in your datacenters or in the cloud. Moreover, ADF Service can be used for the transformation of the ingested data to fulfill your business requirements. In most Big Data solutions, ADF Service is used as an ETL or ELT tool for data ingestion. When you enroll for Azure Training In Hyderabad, you can thoroughly know the usefulness of ADF Service.
139.How do the Mapping data flow and Wrangling data flow transformation activities differ in Data Factory?
Mapping data flow activity is a data transformation activity that is visually designed. It enables you to effectively design graphical data transformation logic in absence of an expert developer. Moreover, it is operated as an activity inside the ADF pipeline on a fully-managed ADF scaled-out Spark cluster.
On the other hand, wrangling data flow activity denotes a data preparation activity that does not use code. It integrates with Power Query Online for making the Power Query M functions accessible for data wrangling through spark implementation.
140.What Are Azure Databricks?
Azure Databricks represent an easy, quick, and mutual Apache Spark based analytics platform that is optimized for Azure. It is being designed in partnership with the founders of Apache Spark. Moreover, Azure Databricks blends the finest of Databricks and Azure to let customers speed up innovation through a quick setup. The smooth workflows and an engaging workspace facilitate teamwork between data engineers, data scientists, and business analysts.
141.What is Azure SQL Data Warehouse?
It is a huge storage of data collected from a broad range of sources in a company and useful to make management decisions. These warehouses enable you to accumulate data from diverse databases existing as either remote or distributed systems.
An Azure SQL Data Warehouse can be created by integrating data from multiple sources which can be utilized for decision making, analytical reporting, etc. In other words, it is a cloud-based enterprise application allowing you to function under parallel processing to rapidly examine a complex query from the massive data volume. Also, it works as a solution for Big-Data concepts.
142.What is Azure Data Lake?
Azure Data Lake streamlines processing tasks and data storage for analysts, developers, and data scientists. It is an advanced mechanism that supports the mentioned tasks across multiple platforms and languages.
It removes the barriers linked with data storage. Also, it makes it simpler to carry out steam, batch, and interactive analytics. Features in Azure Data Lake resolve the challenges linked with productivity and scalability and fulfill growing business requirements.
143.Explain data source in the azure data factory:
The data source is the source or destination system that comprises the data intended to be utilized or executed upon. Type of data can be binary, text, csv files, json files, etc. It can be image files, video, audio, or might be a proper database.
Examples of data source include azure data lake storage, Azure blob storage, or any other database such as mysql db, Azure sql databsse, postgres, etc.
144.Wahy is it beneficial to use the Auto Resolve Integration Runtime ?
AutoResolveIntegrationRuntime automatically tries to execute the activities in the same region or in close proximity to the region of the particular sink data source. The same can boost performance.
145.How is lookup activity useful in the azure data factory?
In the ADF pipeline, the Lookup activity is commonly used for configuration lookup purposes. The source dataset is available in it. Moreover, it is used to retrieve the data from the source dataset and then send it as the output of the activity. Generally, the output of the lookup activity is further used in the pipeline for taking some decisions or presenting any configuration as a result.
In simple terms, lookup activity is used for data fetching in ADF pipeline. The way you would use it entirely relies on your pipeline logic. It is possible to obtain only the first row or you can retrieve the complete rows depending on your dataset or query.
146.What are the types of variables in the azure data factory?
Variables in the ADF pipeline allow temporary holding of the values. Their usage is similar just to the variables used in the programming language. For assigning and manipulating the variable values, two types of activities are used i.e. Set Variable and append variable.
Two types of variables in Azure data factory are:
- System variable: These are the fixed variable from the Azure pipeline. Their examples include pipeline id, pipeline name, trigger name, etc.
- User variable: User variables are manually declared depending on the logic of the pipeline.
147.Explain the linked service in the azure data factory?
In Azure Data Factory, linked service represents the connection system used to connect the external source. It functions as the connection string for holding the user validation information.
Two ways to create the linked service are:
- ARM template way
- Using the Azure Portal
148.What does it mean by the breakpoint in the ADF pipeline?
Breakpoint signifies the debug portion of the pipeline. If you wish to check the pipeline with any specific activity, you can accomplish it through the breakpoints.
To understand better, for example, you are using 3 activities in the pipeline and now you want to debug up to the second activity only. This can be done by placing the breakpoint at the second activity. To add a breakpoint, you can click the circle present at the top of the activity.
149.What is the difference between the Dataset and Linked Service in Data Factory?
Linked Service is a description of the connection string that is used to connect to the data stores. For example, when ingesting data from a SQL Server instance, the linked service contains the name for the SQL Server instance and the credentials used to connect to that instance.
- Dataset is a reference to the data store that is described by the linked service. When ingesting data from a SQL Server instance, the dataset points to the name of the table that contains the target data or the query that returns data from different tables.
150.Data Factory consists of a number of components. Mention these components briefly
- Pipeline: The activities logical container
- Activity: An execution step in the Data Factory pipeline that can be used for data ingestion and transformation
- Mapping Data Flow: A data transformation UI logic
- Dataset: A pointer to the data used in the pipeline activities
- Linked Service: A descriptive connection string for the data sources used in the pipeline activities
- Trigger: Specify when the pipeline will be executed
- Control flow: Controls the execution flow of the pipeline activities
- _For more information, check _Starting your journey with Microsoft Azure Data Factory
151.After installing the Self-Hosted Integration Runtime to the machine where the SQL Server instance is hosted, how could we associate the SH-IR created from the Data Factory portal?
We need to register it using the authentication key provided by the ADF portal.
152.What is the difference between the Mapping data flow and Wrangling data flow transformation activities in Data Factory?
Mapping data flow activity is a visually designed data transformation activity that allows us to design a graphical data transformation logic without the need to be an expert developer, and executed as an activity within the ADF pipeline on an ADF fully managed scaled-out Spark cluster.
Wrangling data flow activity is a code-free data preparation activity that integrates with Power Query Online in order to make the Power Query M functions available for data wrangling using spark execution.
153.Data Factory supports two types of compute environments to execute the transform activities. Mention these two types briefly
On-demand compute environment, using a computing environment fully managed by the ADF. In this compute type, the cluster will be created to execute the transform activity and removed automatically when the activity is completed.
Bring Your Own environment, in which the used compute environment is managed by you and ADF.
154.What is Azure SSIS Integration Runtime?
A fully managed cluster of virtual machines hosted in Azure and dedicated to run SSIS packages in the Data Factory. The SSIS IR nodes can be scaled up, by configuring the node size, or scaled out by configuring the number of nodes in the VMs cluster.
155.What is required to execute an SSIS package in Data Factory?
We need to create an SSIS IR and an SSISDB catalog hosted in Azure SQL Database or Azure SQL Managed Instance.
156.Which Data Factory activity is used to run an SSIS package in Azure?
Execute SSIS Package activity.
157.Data Factory supports three types of triggers. Mention these types briefly
The Schedule trigger that is used to execute the ADF pipeline on a wall-clock schedule
The Tumbling window trigger that is used to execute the ADF pipeline on a periodic interval, and retains the pipeline state
The Event-based trigger that responds to a blob related event, such as adding or deleting a blob from an Azure storage account
158.Any Data Factory pipeline can be executed using three methods. Mention these methods
Under Debug mode
Manual execution using Trigger now
Using an added scheduled, tumbling window or event trigger
159.From where we can monitor the execution of a pipeline that is executed under the Debug mode?
The Output tab of the pipeline, without the ability to use the Pipeline runs or Trigger runs under ADF Monitor window to monitor it.
160.What are the advantages of Azure Resource Manager?
The advantages of Azure Resource Manager are:
- The resource manager helps us to manage the usage of the application resources. The insured resource manager is called ARM.
- The ARM helps deploy, manage, and monitor all the resources for an application, a solution, or a group.
- Users can be granted access to resources that they require within a resource manager.
- It helps retrieve groups billing resources. Which group is using more, which group is using less, and which group has contributed more to this month’s bill? Stuff like that. Those details can be obtained using Azure Resource Manager
- Provisioning resources is made much easier with the help of this resource manager.
161.How has integrated hybrid cloud been useful for Azure?
The integration of hybrid cloud has been useful for Azure in the following ways:
- We get the best of both worlds since applications and data can be shared between the public and private clouds.
- Seamless on-premise infrastructure scalability.
- It boosts the productivity of the on-premises application.
- We get a greater efficiency with a combination of Azure services, DevOps processing tools for the application running on-premises.
- Users can take advantage of a constantly updated Azure service and other AWS marketplace applications for their on-premises environment.
- We are not worried about the deployment locations
162.hat kind of storage is best suited to handle unstructured data?
Block storage is well-suited because block storage is designed to support unstructured data. It places the data into different tiers based on how often they are accessed.
163.Azure Data Factory Execute Pipeline Activity Example
The Execute Pipeline activity can be used to invoke another pipeline. This activity’s functionality is similar to SSIS’s Execute Package Task and you can use it to create complex data flows, by nesting multi-level pipelines inside each other. This activity also allows passing parameter values from parent to child pipeline.
Add string parameter PL_TableName to ExploreSQLSP_PL pipeline
Assign activity parameter TableName to pipeline parameter PL_TableName
Select activity SP_AC, switch to the Stored Procedure tab, hit value textbox for TableName parameter and click ‘Add dynamic content‘ link under that text box:
164.Azure Data Factory Lookup Activity?
The Lookup activity can read data stored in a database or file system and pass it to subsequent copy or transformation activities. Unlike SSIS’s Lookup transformation, which allows performing a lookup search at the row level, data obtained from ADF’s Lookup activity can only be used on an object level. In other words, you can use ADF’s Lookup activity’s data to determine object names (table, file names, etc.) within the same pipeline dynamically.
- Lookup activity can read from a variety of database and file-based sources, you can find the list of all possible data sources here.
- Lookup activity can work in two modes:
- Singleton mode – Produces first row of the related dataset
- Array mode – Produces the entire dataset
- We will look into both modes of Lookup activity in this post.
165.Azure Data Factory Triggers?
ADF v2 has introduced a concept of triggers as a way to automate pipeline executions. Triggers represent a unit of processing that determines when a pipeline execution needs to be initiated. The same pipeline could be kicked-off more than once and each execution of it would have its own run ID. Pipelines and triggers have a many-to-many relationship. Multiple triggers can kick-off a single pipeline, or a single trigger can kick off multiple pipelines. Here is the list of different trigger types available in ADF:
166.What are Data Flows in Azure Data Factory?
The Azure Data Factory (ADF) service was introduced in the tip Getting Started with Azure Data Factory – Part 1 and Part 2. There we explained that ADF is an orchestrator of data operations, just like Integration Services (SSIS). But we skipped the concepts of data flows in ADF, as it was out of scope. This tip aims to fill this void.
A data flow in ADF allows you to pull data into the ADF runtime, manipulating it on-the-fly and then writing it back to a destination. Data flows in ADF are similar to the concept of data flows in SSIS, but more scalable and flexible. There are two types of data flows:
- Data flow – This is the regular data flow, previously called the mapping data flow.
- Power Query – This data flow uses the Power Query technology, which can also be found in Power BI Desktop, Analysis Services Tabular and the “Get Data” feature in Excel. At the time of writing, it is still in preview. Previously, this data flow was called the “wrangling data flow”.
167.Azure Data Factory Pipeline Scheduling, Error Handling and Monitoring ?
It’s recommended to read part 1 before you continue with this tip. It shows you how to install ADF and how to create a pipeline that will copy data from Azure Blob Storage to an Azure SQL database as a sample ETL \ ELT process.
168.Azure Data Factory for Web Scraping?
Web scraping is a term used to extract data from a website in an automated way. There are multiple ways to fetch data from a webpage, and you can use scripts such as Python, R, .NET, Java or tools such as Azure Data Factory.
Azure Data Factory is an Azure cloud infrastructure ETL (Extract-Transform-Load) functionality. It provides a code-free user interface with the scale-out serverless data integration and data transformation. You can configure data driven workflows for orchestrating and automating data transformation and data movement.
169.Transfer Data to the Cloud Using Azure Data Factory?
For the purpose of this exercise, I have downloaded and installed the AdventureWorks 2016 database on my local SQL Server. This database contains DimDate table, with 3652 rows and DateKey as PK constraint. I have generated the below script for this table, which I will use to create a target table in Azure SQL database DstDb:
170.Data Factory Concepts Mapping with SSIS?
For those who are coming from conventional SQL Server Integration Services background following table provide high level mapping between different components of SSIS and Azure Data Factory.
- Data Factory
- SSIS
- Pipeline
- Package (*.dtsx) or Control Flow
- Activity
- SSIS Control flow Components
- Linked Services
- Connections
- Dataset
- Connection Mapping
- Workflow
- Dataflow
171.Why Azure Data Factory?
- Cloud Need an Orchestration Tool
- Extract Data from Cloud Source
- Transform Data using Datafactory workflows
- Load/Sink data into Datalake or datawarehouse
172.ETL process in Azure Data Factory
Following are high level steps to create a simple pipeline in ADF. Following example copy data from SQL Server DB to Azure Data Lake.
Steps for Creating ETL
- Create a Linked Service for source data store which is SQL Server Database
- Create Source Dataset using Source Link Service
- Create a Linked Service for destination data store which is Azure Data Lake Store
- Create Sink/Target Dataset using link Service created in Step 3
- Create the pipeline and add copy activity
- Schedule the pipeline by adding a trigger
173.Define default values for the Pipeline Parameters?
You can define default values for the parameters in the pipelines.
174.Can an activity output property be consumed in another activity?
An activity output can be consumed in a subsequent activity with the @activity construct.
175.Data Factory consists of a number of components. Mention these components briefly
- Pipeline: The activities logical container
- Activity: An execution step in the Data Factory pipeline that can be used for data ingestion and transformation
- Mapping Data Flow: A data transformation UI logic
- Dataset: A pointer to the data used in the pipeline activities
- Linked Service: A descriptive connection string for the data sources used in the pipeline activities
- Trigger: Specify when the pipeline will be executed
- Control flow: Controls the execution flow of the pipeline activities
176.What is the difference between the Dataset and Linked Service in Data Factory?
Linked Service is a description of the connection string that is used to connect to the data stores. For example, when ingesting data from a SQL Server instance, the linked service contains the name for the SQL Server instance and the credentials used to connect to that instance.
Dataset is a reference to the data store that is described by the linked service. When ingesting data from a SQL Server instance, the dataset points to the name of the table that contains the target data or the query that returns data from different tables.
177.What is Data Factory Integration Runtime?
Integration Runtime is a secure compute infrastructure that is used by Data Factory to provide the data integration capabilities across the different network environments and make sure that these activities will be executed in the closest possible region to the data store.
178.Data Factory supports three types of Integration Runtimes. Mention these supported types with a brief description for each?
Azure Integration Runtime: used for copying data from or to data stores accessed publicly via the internet
Self-Hosted Integration Runtime: used for copying data from or to an on-premises data store or networks with access control
Azure SSIS Integration Runtime: used to run SSIS packages in the Data Factory
179.What is the difference between the Mapping data flow and Wrangling data flow transformation activities in Data Factory?
Mapping data flow activity is a visually designed data transformation activity that allows us to design a graphical data transformation logic without the need to be an expert developer, and executed as an activity within the ADF pipeline on an ADF fully managed scaled-out Spark cluster.
Wrangling data flow activity is a code-free data preparation activity that integrates with Power Query Online in order to make the Power Query M functions available for data wrangling using spark execution.
180.Data Factory supports two types of compute environments to execute the transform activities. Mention these two types briefly?
On-demand compute environment, using a computing environment fully managed by the ADF. In this compute type, the cluster will be created to execute the transform activity and removed automatically when the activity is completed.
Bring Your Own environment, in which the used compute environment is managed by you and ADF.
181.What is Azure SSIS Integration Runtime?
A fully managed cluster of virtual machines hosted in Azure and dedicated to run SSIS packages in the Data Factory. The SSIS IR nodes can be scaled up, by configuring the node size, or scaled out by configuring the number of nodes in the VMs cluster.
182.What is required to execute an SSIS package in Data Factory?
We need to create an SSIS IR and an SSISDB catalog hosted in Azure SQL Database or Azure SQL Managed Instance
183.Which Data Factory activities can be used to iterate through all files stored in a specific storage account, making sure that the files smaller than 1KB will be deleted from the source storage account?
- For Each activity for iteration
- Get Metadata to get the size of all files in the source storage
- If Condition to check the size of the files
- Delete activity to delete all files smaller than 1KB
184.Data Factory supports three types of triggers. Mention these types briefly
The Schedule trigger that is used to execute the ADF pipeline on a wall-clock schedule
The Tumbling window trigger that is used to execute the ADF pipeline on a periodic interval, and retains the pipeline state
The Event-based trigger that responds to a blob related event, such as adding or deleting a blob from an Azure storage account
185.Any Data Factory pipeline can be executed using three methods. Mention these methods
- Under Debug mode
- Manual execution using Trigger now
- Using an added scheduled, tumbling window or event trigger
186.From where we can monitor the execution of a pipeline that is executed under the Debug mode?
The Output tab of the pipeline, without the ability to use the Pipeline runs or Trigger runs under ADF Monitor window to monitor it.
187.Briefly describe the purpose of the ADF Service?
ADF is used mainly to orchestrate the data copying between different relational and non-relational data sources, hosted in the cloud or locally in your datacenters. Also, ADF can be used for transforming the ingested data to meet your business requirements. It is ETL, or ELT tool for data ingestion in most Big Data solutions.
188.Data Factory consists of a number of components. Mention these components briefly?
- Pipeline: The activities logical container
- Activity: An execution step in the Data Factory pipeline that can be used for data ingestion and transformation
- Mapping Data Flow: A data transformation UI logic
- Dataset: A pointer to the data used in the pipeline activities
- Linked Service: A descriptive connection string for the data sources used in the pipeline activities
- Trigger: Specify when the pipeline will be executed
- Control flow: Controls the execution flow of the pipeline activities
189.Briefly describe the purpose of the ADF Service?
ADF is used mainly to orchestrate the data copying between different relational and non-relational data sources, hosted in the cloud or locally in your datacenters. Also, ADF can be used for transforming the ingested data to meet your business requirements. It is ETL, or ELT tool for data ingestion in most Big Data solutions.
190.Is Azure Data Factory Certification worth doing?
Absolutely, there is a massive demand for Azure Data Engineers proficient in Data Factory. Since lots of companies are adopting Microsoft Azure as a cloud computing platform, so companies need skilful professionals to handle their operations.
Yes, ADF is the best tool available in the market for ETL processes. Without writing any complex
191.Is Azure Data Factory an ETL tool?
algorithms, it simplifies the entire data migration process.
192.Data transformed at scale with code-free pipelines?
The new browser-based tooling experience provides code-free pipeline authoring and deployment with a modern, interactive web-based experience.
For visual data developers and data engineers, the Data Factory web UI is the code-free design environment that you will use to build pipelines. It’s fully integrated with Visual Studio Online Git and provides integration for CI/CD and iterative development with debugging options.
193.Briefly explain different components of Azure Data Factory?
- Pipeline: It represents activities logical container.
- Dataset: It is a pointer to the data utilized in the pipeline activities
- Mapping Data Flow: Represents a data transformation UI logic
- Activity: In the Data Factory pipeline, Activity is the execution step that can be utilized for data consumption and transformation.
- Trigger: Mentions the time of pipeline execution.
- Linked Service: It represents an explanatory connection string for those data sources being used in the pipeline activities
- Control flow: Regulates the execution flow of the pipeline activities
194.What is the need for Azure Data Factory?
While going through azure tutorial, you would come across this terminology. Since data comes from different sources, it can be in any form. Such varied sources will transfer or channelize the particular data in various ways and the same can be in a varied format. Whenever we convey this data on the cloud or specific storage, it is inevitable to ascertain that this data is efficiently managed. So, you have to transform the data and remove unnecessary parts.
Sine data transfer is concerned, it is important to ascertain that data is collected from various sources and convey at a common place. Now store it and if needed, transformation needs to be done. The same can be accomplished by a conventional data warehouse too but it comes with some limitations. Occasionally, we are impelled to use custom applications that can manage all such processes distinctly. But this process consumes time and integration of all such processes is troublesome. So, it is necessary to find out an approach to automate this process or design appropriate workflows. Azure Data Factory assists you to coordinate this entire process more conveniently.
195.Is there any limit on how many integration runtimes can be performed?
No, there is no limit on the number of integration runtime occurrences you can have in an Azure data factory. However, there is a limit on the number of VM cores that the integration runtime can utilize for every subscription for SSIS package implementation. When you pursue Microsoft Azure Certification, you should be aware of these terms.
196.Explain Data Factory Integration Runtime?
Integration Runtime is a safe computing infrastructure being used by Data Factory for offering data integration abilities over various network environments. Moreover, it ascertains that such activities will get implemented in the nearest possible area to the data store. If you want to Learn Azure Step by step, you must be aware of this and other such fundamental Azure terminologies.
197.What it means by blob storage in Azure?
Blob storage in Azure is one of the key aspects to learn if you want to get Azure fundamentals certification. Azure Blob Storage is a service very useful for the storage of massive amounts of unstructured object data like binary data or text. Moreover, you can use Blob Storage to render data to the world or for saving application data confidentially. Typical usages of Blob Storage include:
- Directly serving images or documents to a browser
- Storage of files for distributed access
- Streaming audio and video
- Storing data for backup and reinstate disaster recovery, and archiving
- Storing data for investigation by an on-premises or any Azure-hosted service
198.Mentions the steps for creating ETL process in Azure Data Factory?
When attempting to retrieve some data from Azure SQL server database, if anything needs to be processed, it would be processed and saved in the Data Lake Store. Here are the steps for creating ETL:
- Firstly, create a Linked Service for source data store i.e. SQL Server Database
- Suppose that we are using a cars dataset
- Now create a Linked Service for a destination data store that is Azure Data Lake Store
- After that, create a dataset for Data Saving
- Setup the pipeline and add copy activity
- Finally, schedule the pipeline by inserting a trigger
199.How to create Azure Functions?
Azure Functions are solutions for implementing small lines of functions or code in the cloud. With these functions, we can choose preferred programming languages. You need to pay only for the time the code runs which means that you need to pay per usage. It supports a wide range of programming languages including F#, C#, Node.js, Java, Python, or PHP. Also, it supports continuous deployment as well as integration. It is possible to develop serverless applications through Azure Functions applications. When you enroll for Azure Training In Hyderabad, you can thoroughly know how to create Azure Functions.
200.What is Azure SQL Data Warehouse?
It is a huge storage of data collected from a broad range of sources in a company and useful to make management decisions. These warehouses enable you to accumulate data from diverse databases existing as either remote or distributed systems. An Azure SQL Data Warehouse can be created by integrating data from multiple sources which can be utilized for decision making, analytical reporting, etc. In other words, it is a cloud-based enterprise application allowing you to function under parallel processing to rapidly examine a complex query from the massive data volume. Also, it works as a solution for Big-Data concepts.