The Knowledge Academy Logo
The Knowledge Academy Logo
01344 203999 - Available 24/7

Send us your message.

X

Perform Data Engineering On Microsoft HD Insight (M20775)

Key points about this course


Duration: 5 Days*

Exam: Perform Data Engineering on Microsoft Azure HDInsight

Accredited: Yes

Dates & Prices Enquire
  • Microsoft accredited training provided by the largest training company globally
  • Receive experienced tuition from our expert Microsoft training instructors
  • Our Microsoft training courses are fully accredited by Microsoft

Available delivery methods for this course

Classroom Icon

Classroom

Onsite Icon

Onsite

Online Icon

Online

Virtual Icon

Live Virtual

Course Information

Perform Data Engineering on Microsoft HD Insight Course Overview | Azure Training | M20775

This 5-day course is intended to teach delegates how to plan and implement big data workflows on HDInsight.

This Performing Big Data Engineering on Microsoft Cloud Services course is fully accredited by Microsoft through the Microsoft Silver Partnership held by The Knowledge Academy.

After completing this course, delegates will be able to:

  • Deploy HDInsight Clusters
  • Authorise Users to Access Resources
  • Load Data into HDInsight
  • Troubleshoot HDInsight
  • Implement Batch Solutions
  • Design Batch ETL Solutions for Big Data with Spark
  • Analyse Data with Spark SQL
  • Analyse Data with Hive and Phoenix
  • Describe Stream Analytics
  • Implement Spark Streaming Using the DStream API
  • Develop Big Data Real-Time Processing Solutions with Apache Storm
  • Build Solutions that use Kafka and HBase

Perform Data Engineering on Microsoft HD Insight Course Outline | Azure Training | M20775

This course includes the following modules:

Module 1: Getting Started with HDInsight

This module introduces Hadoop, the MapReduce paradigm, and HDInsight.

Lessons:

  • What is Big Data?
  • Introduction to Hadoop
  • Working with MapReduce Function
  • Introducing HDInsight

Lab: Working with HDInsight

  • Provision an HDInsight cluster and run MapReduce jobs

 

Module 2: Deploying HDInsight Clusters

This module provides an overview of the Microsoft Azure HDInsight cluster types, in addition to the creation and maintenance of the HDInsight clusters. The module also demonstrates how to customise clusters by using script actions through the Azure Portal, Azure PowerShell, and the Azure command-line interface (CLI). This module includes labs that provide the steps to deploy and manage the clusters.

Lessons

  • Identifying HDInsight cluster types
  • Managing HDInsight clusters by using the Azure portal
  • Managing HDInsight Clusters by using Azure PowerShell

Lab: Managing HDInsight clusters with the Azure Portal

  • Create an HDInsight cluster that uses Data Lake Store storage
  • Customize HDInsight by using script actions
  • Delete an HDInsight cluster

 

Module 3: Authorizing Users to Access Resources

This module provides an overview of non-domain and domain-joined Microsoft HDInsight clusters, in addition to the creation and configuration of domain-joined HDInsight clusters. The module also demonstrates how to manage domain-joined clusters using the Ambari management UI and the Ranger Admin UI. This module includes the labs that will provide the steps to create and manage domain-joined clusters.

Lessons:

  • Non-domain Joined clusters
  • Configuring domain-joined HDInsight clusters
  • Manage domain-joined HDInsight clusters

Lab: Authorizing Users to Access Resources

  • Prepare the Lab Environment
  • Manage a non-domain joined cluster

 

Module 4: Loading data into HDInsight

This module provides an introduction to loading data into Microsoft Azure Blob storage and Microsoft Azure Data Lake storage. By the end of this lesson, delegates will understand how to use multiple tools to transfer data to an HDInsight cluster. They will also learn how to load and transform data to decrease query run time.

Lessons:

  • Storing data for HDInsight processing
  • Using data loading tools
  • Maximising value from stored data

Lab: Loading Data into an Azure account

  • Load data for use with HDInsight

 

Module 5: Troubleshooting HDInsight

In this module, delegates will learn how to interpret logs associated with the various services of Microsoft Azure HDInsight cluster to troubleshoot any issues with these services. Delegates will also learn about Operations Management Suite (OMS) and its capabilities.

Lessons:

  • Analyse HDInsight logs
  • YARN logs
  • Heap dumps
  • Operations management suite

Lab: Troubleshooting HDInsight

  • Analyse HDInsight logs
  • Analyse YARN logs
  • Monitor resources with Operations Management Suite

 

Module 6: Implementing Batch Solutions

In this module, delegates will look at implementing batch solutions in Microsoft Azure HDInsight by using Hive and Pig. They will also discuss the approaches for data pipeline operationalisation that are available for big data workloads on an HDInsight stack.

Lessons:

  • Apache Hive storage
  • HDInsight data queries using Hive and Pig
  • Operationalize HDInsight

Lab: Implement Batch Solutions

  • Deploy HDInsight cluster and data storage
  • Use data transfers with HDInsight clusters
  • Query HDInsight cluster data

 

Module 7: Design Batch ETL solutions for Big Data with Spark

This module provides an overview of Apache Spark, describing its main characteristics and key features. The module also explains how to design batch Extract, Transform, Load (ETL) solutions for big data with Spark on HDInsight. The final lesson includes some guidelines to improve Spark performance.

Lessons:

  • What is Spark?
  • ETL with Spark
  • Spark performance

Lab: Design Batch ETL solutions for big data with Spark.

  • Create a HDInsight Cluster with access to Data Lake Store
  • Use HDInsight Spark cluster to analyse data in Data Lake Store
  • Analysing website logs using a custom library with Apache Spark cluster on HDInsight
  • Managing resources for Apache Spark cluster on Azure HDInsight

 

Module 8: Analyse Data with Spark SQL

This module describes how to analyse data by using Spark SQL. In it, delegates will learn the differences between RDD, Datasets, and Dataframes, identify the use cases between Iterative and Interactive queries, and describe best practices for Caching, Partitioning, and Persistence. Delegates will also look at how to use Apache Zeppelin and Jupyter notebooks, carry out exploratory data analysis, then submit Spark jobs remotely to a Spark cluster.

Lessons:

  • Implementing iterative and interactive queries
  • Perform exploratory data analysis

Lab: Performing exploratory data analysis by using iterative and interactive queries

  • Build a machine learning application
  • Use zeppelin for interactive data analysis
  • View and manage Spark sessions by using Livy

 

Module 9: Analyse Data with Hive and Phoenix

In this module, delegates will learn about running interactive queries using Interactive Hive (also known as Hive LLAP or Live Long and Process) and Apache Phoenix. They will also learn about the various aspects of running interactive queries using Apache Phoenix with HBase as the underlying query engine.

Lessons:

  • Implement interactive queries for big data with interactive hive.
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix

Lab: Analyse data with Hive and Phoenix

  • Implement interactive queries for big data with interactive Hive
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix

 

Module 10: Stream Analytics

The Microsoft Azure Stream Analytics service has some built-in features and capabilities that make it as easy to use as a flexible stream processing service in the cloud. Delegates will see that there are a number of advantages to using Stream Analytics for streaming solutions. They will also compare features of Stream Analytics to other services available within the Microsoft Azure HDInsight stack, such as Apache Storm. Delegates will learn how to deploy a Stream Analytics job, connect it to the Microsoft Azure Event Hub to ingest real-time data, and execute a Stream Analytics query to gain low-latency insights. After that, delegates will learn how Stream Analytics jobs can be monitored when deployed and used in production settings.

Lessons

  • Stream analytics
  • Process streaming data from stream analytics
  • Managing stream analytics jobs

Lab: Implement Stream Analytics

  • Process streaming data with stream analytics
  • Managing stream analytics jobs

 

Module 11: Implementing Streaming Solutions with Kafka and HBase

In this module, delegates will learn how to use Kafka to build streaming solutions. Delegates will subsequently be taught how to use Kafka to persist data to HDFS by using Apache HBase, and then query this data.

Lessons:

  • Building and Deploying a Kafka Cluster
  • Publishing, Consuming, and Processing data using the Kafka Cluster
  • Using HBase to store and Query Data

Lab: Implementing Streaming Solutions with Kafka and HBase

  • Create a virtual network and gateway
  • Create a storm cluster for Kafka
  • Create a Kafka producer
  • Create a streaming processor client topology
  • Create a Power BI dashboard and streaming dataset
  • Create an HBase cluster
  • Create a streaming processor to write to HBase

 

Module 12: Develop big data real-time processing solutions with Apache Storm

This module explains how to develop big data real-time processing solutions with Apache Storm.

Lessons:

  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm

Lab: Developing big data real-time processing solutions with Apache Storm

  • Stream data with Storm
  • Create Storm Topologies

 

Module 13: Create Spark Streaming Applications

This module describes Spark Streaming, explains how to use discretised streams (DStreams), and explains how to apply the concepts to develop Spark Streaming applications.

Lessons:

  • Working with Spark Streaming
  • Creating Spark Structured Streaming Applications
  • Persistence and Visualisation

Lab: Building a Spark Streaming Application

  • Installing Required Software
  • Building the Azure Infrastructure
  • Building a Spark Streaming Pipeline

 

Who Should Attend this Course?

The primary audience for this course is data engineers (IT professionals, developers, and information workers) data architects, data scientists, and data developers who plan to implement big data engineering workflows on HDInsight.

Prerequisites 

Before attending this course, delegates should possess or be able to demonstrate:

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices
  • Basic knowledge of the Microsoft Windows operating system and its core functionality
  • Working knowledge of relational databases
Please arrive at the venue at 8:45am.
Before attending this course, delegates should possess or be able to demonstrate: Programming experience using R, and familiarity with common R packages Knowledge of common statistical methods and data analysis best practices Basic knowledge of the Microsoft Windows operating system and its core functionality Working knowledge of relational databases
The primary audience for this course is data engineers (IT professionals, developers, and information workers) data architects, data scientists, and data developers who plan to implement big data engineering workflows on HDInsight.
We are able to provide support via phone & email prior to attending, during and after the course.
Delegate pack consisting of course notes and exercises, Manual, Experienced Instructor, and Refreshments
This course is 5 days
Once your booking has been placed and confirmed, you will receive an email which contains your course location, course overview, pre-course reading material (if required), course agenda and payment receipts

The Knowledge Academy does not provide an examination for this course. Delegates will be given access to:

  • Tuition from one of our expert trainers
  • Certificate of completion
  • Refreshments

Dates & Prices

Select your preferred delivery method

Choose a Region

Choose a Location

Choose a Month

Office Icon Attend your course from the office or home
Trainers Icon Interactive support from experienced trainers
Simple Icon Simple to setup and easy to use on any device
Manchester
Mon 1st Jul 2019
Places available
£2295
Southampton
Mon 1st Jul 2019
Places available
£4995
Cambridge
Mon 1st Jul 2019
Places available
£4995
Cardiff
Mon 8th Jul 2019
Places available
£4995
Reading
Mon 8th Jul 2019
Places available
£4995
Belfast
Mon 8th Jul 2019
Places available
£4995
VirtualLive Virtual
Mon 15th Jul 2019
Places available
£2295
Bristol
Mon 15th Jul 2019
Places available
£2295
Dublin
Mon 15th Jul 2019
Places available
€3995
Liverpool
Mon 15th Jul 2019
Places available
£4995
Newcastle
Mon 22nd Jul 2019
Places available
£4995
Glasgow
Mon 22nd Jul 2019
Places available
£2995
Norwich
Mon 22nd Jul 2019
Places available
£4995
Aberdeen
Mon 29th Jul 2019
Places available
£4995
Milton Keynes
Mon 29th Jul 2019
Places available
£2995
Brighton
Mon 29th Jul 2019
Places available
£4995
Birmingham
Mon 5th Aug 2019
Places available
£2295
Maidstone
Mon 5th Aug 2019
Places available
£4995
Sheffield
Mon 5th Aug 2019
Places available
£4995
London
Mon 12th Aug 2019
Places available
£2295
Leeds
Mon 12th Aug 2019
Places available
£2295
Edinburgh
Mon 12th Aug 2019
Places available
£2995
VirtualLive Virtual
Mon 19th Aug 2019
Places available
£2295
Southampton
Mon 19th Aug 2019
Places available
£4995
Cambridge
Mon 19th Aug 2019
Places available
£4995
Nottingham
Mon 19th Aug 2019
Places available
£4995
Bristol
Mon 2nd Sep 2019
Places available
£2295
Sheffield
Mon 2nd Sep 2019
Places available
£4995
Belfast
Mon 2nd Sep 2019
Places available
£4995
Cardiff
Mon 9th Sep 2019
Places available
£4995
Reading
Mon 9th Sep 2019
Places available
£4995
Aberdeen
Mon 9th Sep 2019
Places available
£4995
VirtualLive Virtual
Mon 16th Sep 2019
Places available
£2295
Manchester
Mon 16th Sep 2019
Places available
£2295
Brighton
Mon 16th Sep 2019
Places available
£4995
Cambridge
Mon 16th Sep 2019
Places available
£4995
Milton Keynes
Mon 23rd Sep 2019
Places available
£2995
Newcastle
Mon 23rd Sep 2019
Places available
£4995
Southampton
Mon 23rd Sep 2019
Places available
£4995
London
Mon 30th Sep 2019
Places available
£2295
Liverpool
Mon 30th Sep 2019
Places available
£4995
Edinburgh
Mon 30th Sep 2019
Places available
£2995
Dublin
Mon 7th Oct 2019
Places available
€3995
Glasgow
Mon 7th Oct 2019
Places available
£2995
Norwich
Mon 7th Oct 2019
Places available
£4995
Birmingham
Mon 14th Oct 2019
Places available
£2295
Maidstone
Mon 14th Oct 2019
Places available
£4995
Leeds
Mon 14th Oct 2019
Places available
£2295
VirtualLive Virtual
Mon 21st Oct 2019
Places available
£2295
Nottingham
Mon 21st Oct 2019
Places available
£4995
Cardiff
Mon 21st Oct 2019
Places available
£4995
Sheffield
Mon 21st Oct 2019
Places available
£4995
Manchester
Mon 28th Oct 2019
Places available
£2295
Cambridge
Mon 28th Oct 2019
Places available
£4995
Bristol
Mon 28th Oct 2019
Places available
£2295
Southampton
Mon 4th Nov 2019
Places available
£4995
Milton Keynes
Mon 4th Nov 2019
Places available
£2995
Newcastle
Mon 4th Nov 2019
Places available
£4995
Dublin
Mon 11th Nov 2019
Places available
€3995
Glasgow
Mon 11th Nov 2019
Places available
£2995
Reading
Mon 11th Nov 2019
Places available
£4995
VirtualLive Virtual
Mon 18th Nov 2019
Places available
£2295
Liverpool
Mon 18th Nov 2019
Places available
£4995
Brighton
Mon 18th Nov 2019
Places available
£4995
Nottingham
Mon 18th Nov 2019
Places available
£4995
Edinburgh
Mon 18th Nov 2019
Places available
£2995
London
Mon 25th Nov 2019
Places available
£2295
Norwich
Mon 25th Nov 2019
Places available
£4995
Belfast
Mon 25th Nov 2019
Places available
£4995
VirtualLive Virtual
Mon 2nd Dec 2019
Places available
£2295
Birmingham
Mon 9th Dec 2019
Places available
£2295
Maidstone
Mon 9th Dec 2019
Places available
£4995
Aberdeen
Mon 9th Dec 2019
Places available
£4995
Manchester
Mon 16th Dec 2019
Places available
£2295
Bristol
Mon 16th Dec 2019
Places available
£2295
Southampton
Mon 16th Dec 2019
Places available
£4995
Leeds
Mon 16th Dec 2019
Places available
£2295

Complete the steps below to receive a quote or more information

How will you be funding your training?

Self funding

Company funding

Not sure

Key points about this course


Duration: 5 Days*

Exam: Perform Data Engineering on Microsoft Azure HDInsight

Accredited: Yes


Why choose TKA logo


Gold Tag

Best price in the industry

You won't find better value in the marketplace. If you do find a lower price, we will beat it.

Trusted Icon

Trusted & Approved

Microsoft Azure Training

Delivery Icon

Various delivery methods

Flexible delivery methods are available depending on your learning style.

Resource Icon

Resources

Resources are included for a comprehensive learning experience.

Thames Water Logo

"Really good course and well organised. Trainer was great with a sense of humour - his experience allowed a free flowing course, structured to help you gain as much information & relevant experience whilst helping prepare you for the exam"

Joshua Davies, Thames Water

Google Logo
Samsung Logo
Shell Logo

"...the trainer for this course was excellent. I would definitely recommend (and already have) this course to others."

Diane Gray, Shell

bannerimg

Click here to save up to 50%!