Do you have an aspiration of becoming a Data Scientist? If yes, then you must be checking out lots of material over the internet to find the best path. As we know Data Science and Artificial Intelligence (AI) is a classic mix of computer science, coding and statistics. To prove that you are one of the best in this field, there are few certification exams available that you can clear. This blog article is dedicated to one of the Machine Learning (ML) exams from Microsoft which clearly states the suitable audience as Data Scientists.
With this article I am going to help you understand the vital aspects of this exam. Also, if you are keen to appear and enjoy the recognition to be the few people on the planet to clear it quite early like me, then this article is for you.
What is the exam all about?
70-774 is a dedicated exam which focuses on skills to develop and consume Machine Learning models using Azure Machine Learning Service and Azure Cognitive Services on public Cloud of Microsoft Azure. There are four major objectives including expectations of this exam explained in this blog article. The exam contains case study based and stand-alone questions. The number of questions could be anywhere around 40 (I got 37!). It’s a typical Microsoft exam allowing candidates to finish exam in 150 minutes. I can say that one can learn every topic of this exam, but three things are highly recommended before starting:
First: Statistics 101 knowledge which is required to fully understand the AI methods and algorithms.
Second: Basic Python or R knowledge.
Third: Understanding of few Azure Cloud Services and its functions not all, but including – Azure Storage Account, Azure Data Factory, Azure HDInsight, Azure Cognitive Services, Azure VM and Azure SQL.
So, if you have not used Azure at any point of time, please get your free trial account and start exploring. You can visit here Cloud Computing with Microsoft Azure – Level 2 to find out one of the best ways to jump start learning Azure. For Data Scientists, this exam is all about optimally utilizing Azure ML platform for operations, but for newbies it’s much more than this. I feel it is an opportunity to learn a lot of important Data Science concepts for building stronger base in Machine Learning technologies.
Objective 1: Prepare Data for Analysis in Azure Machine Learning and Export from Azure Machine Learning
The focus area of this objective is to find out how good a candidate possess knowledge about the data pre-processing which is a vital process done by every Data Scientist before developing a perfectly working Machine Learning model. If you are new to Data Science, then go ahead and understand the importance of following topics:
Univariate vs Multivariate Datasets in Machine Learning
Impacts of missing data / duplicate data and outliers in datasets
Methods to deal with missing data / duplicate data and outliers in datasets
Feature Selection and Feature Engineering
Well I can say that all these requires major effort and takes good amount of time without the right guidance. If you want to cut down your time and effort, you can gain all this knowledge here Machine Learning Essentials – Level 2 from experienced professionals.
If you are aware about the Azure Machine Learning environment, you would know that there are many modules which are used in building up a ML model. These modules are pre-configured and packaged piece of codes which performs a certain function. As far as this objective is concerned, there will be questions around the following modules:
It is highly recommended that you follow the Microsoft Azure documentation and perform labs related to all modules which I am mentioning not only in this objective, but also for the entire article. It is important to understand when to use a module, what properties of a module needs to be configured, what is the expected outcome and in the entire flow of a Machine Learning model which places this module can fit in.
Objective 2: Develop Machine Learning Models
In this objective, the candidate needs to prove his skills in selecting and configuring the right algorithm suitable for various real-life scenarios. Before you kick start your learning process, you need to make sure that you go through these topics if you are new to Data Science:
Labeled and Unlabeled Data
Types of Machine Learning Algorithms: Supervised, Unsupervised and Reinforcement
Use cases of each type of Machine Learning Algorithms
What are Model Parameters and Model Hyperparameters along with their difference
Advantages and Process of Model Ensembling
Stacking Method for Model Ensembling
Training and Evaluation of Machine Learning Models
Difference between R and Python (Mostly Pros and Cons)
The questions in this section would be around the following modules:
There are 25 algorithms which you need to understand as supported natively by Azure ML. A couple of questions in the exam would demand you to know which algorithm should be used as per the scenario. For this, you can download this cheat sheet for your reference.
Also, don’t forget to learn the evaluation metrics for all types of algorithms including classification, regression, clustering and recommendation. These metrics would need you to have some basics of statistics as they are purely mathematical values. These metrics helps you understand how well the model is designed.
Objective 3: Operationalize and Manage Azure Machine Learning Services
This objective focuses on one’s operational capability with Microsoft Azure ML platform. There are couple of tasks which needs to be done as a process to create, publish and utilize the Machine Learning models. A practical hands-on with Azure Machine Learning is the best approach to prepare for this objective.
Having an Azure subscription is vital for this. Refer Azure documentation to create an end to end sample Machine Learning model. Gather some information related to Jupyter notebooks if you have never worked with a notebook software. Microsoft Azure natively supports Jupyter notebooks. Notebook software are quiet famous in Data Science as they allow developers to create and share models easily.
Learn how to connect to the web service of Machine Learning models for consumption. Understand the differences and advantages between consuming the Machine Learning models using batch execution vs request response.
Apart from the Azure Machine Learning Service, Microsoft also provides Cognitive Services which helps to implement AI functionalities in smart solutions on the go. These cognitive services are hosted on Cloud and it can be accessed via APIs. There are three cognitive services under the scope of this certification exam: Computer Vision, Language and Knowledge.
Computer Vision APIs are useful in performing face detection, face recognition, OCR and object detection. Language APIs are useful in processing text. It is capable enough to understand more than 120 languages and they are widely used for processes like sentimental analysis. Knowledge APIs are the one which provides knowledge from a defined source and helps create recommendations. Learn about their functionalities and try few API calls with these cognitive services to practice better.
Objective 4: Use Other Services for Machine Learning
Microsoft have launched an open source deep learning toolkit called CNTK or Cognitive Toolkit. This can be installed on any machine whether local or Cloud. Unlike a PaaS service like Azure Machine Learning, CNTK provides a platform which inclines towards IaaS. With the enhancement of GPU based technology used for Data Science processing, NVIDIA has specially launched N-series machines on Azure and similar offerings on other Clouds. I have seen that for examination you don’t have to be proficient in using CNTK, but little bit of hands-on could give you an advantage. Just go through the documentation and few tutorials as shown here. The process of creating Feed Forward Neural Network is defined in the documentation.
Cortana Intelligence Gallery has a lot of ready to use templates which turns on certain Azure services in the right configuration needed for various projects. It is very simple to start an experiment using a template from this gallery. Login to the gallery, choose your template and just jumpstart. Go through the process once to checkout all details as it would be helpful for the exam.
Azure HDInsight is a deep subject as it is a wide service. From exam perspective, you can go ahead and learn the types of clusters, their default configurations and use cases. Though exam objectives states that working with Spark HDI and Spark SQL is vital, I don’t see any reason for a disappointment if you don’t know Spark as it doesn’t seem a heavily dependent topic.
For data analysis with SQL Server, there are three things that you should know for the exam:
Deploying an Azure VM with SQL Server OS like you do with any Azure VM
Configuring SQL Server to allow execution of R Scripts which is just a two-line code
Command and its parameters which helps execute R Scripts in T-SQL