close

close

Thank you for your enquiry!

One of our training experts will be in touch shortly to go over your training requirements.

close

close

Press esc to close

close close

Back to course information

Thank you for your enquiry!

One of our training experts will be in touch shortly to go overy your training requirements.

close close

Thank you for your enquiry!

One of our training experts will be in touch shortly to go over your training requirements.

Big Data and Analytics Training

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Hadoop Big Data Certification Training Course Outline:

The Hadoop Big Data Certification is a two-day course. The course content is divided into two sections: Hadoop and Big Data. The following outlines the topics that will be covered in each section.

Hadoop

  • Understanding Big Data and Hadoop
  • Processing Distributed Data
  • Hadoop Project
  • Introduction to Data Storage and Processing
  • Defining Hadoop Cluster Requirements
  • Configuring a Cluster
  • Maximising HDFS Robustness
  • Managing Resources and Cluster Health
  • Maintaining a Cluster
  • Extending Hadoop
  • Implementing Data Ingress and Egress
  • Planning for Backup, Recovery, and Security

Big Data

  • Introduction to Big Data
  • Storing Big Data
  • Processing Big Data
  • Tools and Techniques to Analyse Big Data
  • Developing a Big Data Strategy
  • Implementing a Big Data Solution

Show moredown

Who should attend this Big Data Hadoop Training Course?

This course is recommended for those needing to implement or enhance their big data environment. Additionally, it is also for anyone looking to advance their analytics career by ensuring excellent foundational knowledge.

Typically, those attending are Project Managers and IT Managers, Database Administrators & Data Architects, Developers & SQL Developers, Data Scientists & Business Intelligence. This is not an exhaustive list.

Hadoop Big Data Certification Prerequisites

There are no prerequisites required to attend this training course. 

Hadoop Big Data Certification Training Course Overview

Hadoop is an open-source software platform for computing. It facilitates the processing of big data sets across computer clusters. Hadoop has no format requirements therefore is an economical solution to any organisation. Thus training, as a Certified Specialist in Hadoop, will be an asset to any organisation.

Training as a Certified Specialist in Hadoop and Big Data, you will hone the knowledge and experience required to devise a Hadoop solution that will satisfy your business requirements and needs. Post successful completion of this course, delegates shall be able to allocate, distribute, and manage resources, monitor the Hadoop file system, job progress, and overall cluster performance.

This comprehensive two-day course will equip delegates with the skills required to install, configure, and navigate the Apache Hadoop platform. In addition, delegates will be able to build a Hadoop solution that is tailored to their specific business requirements. The emergence of large data sets brings with it fresh challenges and it can be difficult to manoeuvre oneself in unchartered territory. This course extensively covers big data and shall include the storage and processing of big data, the tools and techniques used to analyse big data, how to develop a big data strategy, and implementing a big data solution. As a Certified Specialist in Big Data Analytics, you will have the expertise and skills to build competitive strategies around data-driven insights.

Show moredown

What's included in this Hadoop Training Course?

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

Hadoop Administration Training Course Outline: 

The following is a brief synopsis of the topics that will be covered over the intensive one-day course.

  • The Fundamentals of Hadoop
  • The Hadoop Ecosystem
  • Startup and Admin Commands
  • Commissioning and Decommissioning Nodes
  • Configuring a Hadoop Cluster
  • Maintaining a Cluster
  • Monitoring and Troubleshooting Clusters
  • Handling Corrupt and Missing Blocks

Show moredown

Who should attend this Hadoop Training Course?

The Hadoop Administration course is intended for IT professionals, cloud administrators, system administrators, and data engineers. However, this is not an exhaustive list.

Hadoop Administration Prerequisites

There are no formal prerequisites for this course. However, it is recommended that delegates have understood the basics of Hadoop and have knowledge of large data fields, prior to beginning this course. 

Hadoop Administration Training Course Overview

This 1-day course delivers a detailed understanding of the Hadoop open-source framework. Hadoop Administration Training will assist delegates in working with Big Data. It will further aid delegates in using the information collected to improve business objectives, quality of products, and customer satisfaction. 

This course focuses on managing, maintaining, and troubleshooting a Hadoop cluster; creating and starting admin commands, communication commands tools, and commissioning and decommissioning nodes. Furthermore, familiarise yourself with the Hadoop Ecosystem, all within an intensive one-day training course. 

Show moredown

Whats included in this Hadoop Training Course:

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

Big Data Architecture Training Course Outline

The following is a brief synopsis of the topics that will be covered over the intensive one-day course.

  • Brief overview of the Hadoop development framework
  • Real-time processing, Batch Processing
  • Data formats, Data Lifecycle
  • Data model creation
  • Database interfaces
  • Scaling
  • Security and Privacy
  • Hadoop clusters
  • Selecting the right technology
  • Big data and Hadoop administration

Show moredown

Who should attend this Big Data Architecture Training​ Course?

This course is aimed at those who wish to become data architects, data analysts, or database engineers.

Big Data Architecture Training  Prerequisites

Prior to attending this course, proficient knowledge of database management systems and technologies (MapReduce, Hive, HDFS, Spark etc.) is expected of delegates. 

Big Data Architecture Training Course Overview

Big Data Hadoop Architects are responsible for the development and deployment of applications on a large scale. In addition to this, they are tasked with preparing and creating Big Data systems. Delegates shall gain a thorough understanding of how to create a Hadoop solution that meets their business requirements. This comprehensive one-day course will equip delegates with the skills required to install, configure, and manage the Apache Hadoop platform. In addition, delegates will be able to build a Hadoop solution that is tailored to their specific business requirements.

The sudden development of large data sets brings with it fresh challenges and it can be difficult to manoeuvre oneself in unchartered territory. This course covers a wide range of big data architecture material from real-time and batch processing, data formats and data lifecycle, to the various database interfaces. Scalable applications is a critical topic covered within this module. Whether scaling up or down, being able to determine an application’s scalability, and doing so accurately and efficiently, these are essential skills a proficient big data architect will require. Another important issue, which is discussed during the course, is Security and Privacy. The principles of security within the platform alongside threats to privacy will be examined in detail. In addition to these topics, being able to select the best technology that suit the current demands, will be a key feature of the course syllabus. 

Show moredown

What's included in this Big Data Training Course

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Big Data and Hadoop Solutions Architect​ Training Course Outline

The following is a brief synopsis of the topics that will be covered over two-day course.

  • A brief overview of the Hadoop Framework
  • Understanding the role of a Big Data and Hadoop Solutions Architect
  • Learn how to process and analyse data
  • Learn how to identify different behaviours of data
  • Create Data visualisation and migrate large amounts of data
  • Hadoop Clusters: Creating, deploying, maintaining, and securing
  • NoSQL database technologies and Hadoop infrastructure

Show moredown

Who should attend this Big Data and Hadoop Solutions Architect​ Training Course

Typically, those attending are Project Managers and IT Managers, Database Administrators & Data Architects, Data Engineers, IT Systems Engineers, and Cloud Systems Administrators. This is not an exhaustive list.

Big Data and Hadoop Solutions Architect​ Prerequisites

It is highly recommended that delegates should have a comprehensive understanding of Hadoop prior to attending. 

Big Data and Hadoop Solutions Architect Training Course Overview

This Big Data and Hadoop Solutions Architect training course is a two-day intensive course aimed at those who have a comprehensive understanding of Hadoop and wish to consolidate their knowledge of solutions architecture. This course has been formulated to aid delegates in becoming Solutions Architects that are essential for businesses when they look to integrate data from various sources in a limited time frame. A Big Data Hadoop Solutions Architect is responsible for identifying specific issues whilst handling large amounts of data. They are also expected to describe the structure and behaviours of the information whilst utilising the Hadoop technology.

The Big Data and Hadoop Solutions Architect also organises how the Big Data environment ought to be developed, which includes requirement analysis, platform selection, and the design.

The course shall cover the processing and analysing of data, identifying the various behaviours of data, data visualisation and migration of data, Hadoop Clusters in detail, and the NoSQL database technologies.

A Big Data Hadoop Solutions Architect possesses a sort after skill set that is invaluable to many organisations. The demand, for Big Data Hadoop Solutions Architects, has rocketed and continues to do so within the IT industry. 

Show moredown

What's Included in this Big Data and Hadoop Solutions architect Training Course

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

Data Science Analytics Course Outline

This 1-day course will help you develop your skills to become a successful Data Analyst. By taking this course, you will be able to successfully study different types of data and turn it into a valuable source of information. You will also be able to learn various theories which include digital, technological and analytical techniques

The Following topics will be taught during this certification:

  • Introduction to Data Science
  • Understanding Data Wrangling
  • Data Analysis
  • Data Mining
  • Understanding Data Visualisation
  • Data Manipulation
  • Working with Large amounts of Data

Show moredown

Who should attend this Data Analysis Course?

The Data Science Analytics certification has been designed for anyone who is interested in analysing data and identifying any improvements or issues.

Prerequisites

There are no prerequisites for the Data Science Analytics course.

Data Science Analytics Course Overview

Data Science is a versatile area which combines scientific techniques, systems and processes to extract information from various forms of data. A Data Scientist uses the information collected to discover data courses such as revenues, testimonials and product information.

Show moredown

What's included in this Data Analysis Training Course?

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

Data Analytics with R Course Outline

Handling data is increasingly becoming essential within a business. The Data Analytics with R certification will help delegates learn the fundamentals of this programming language and use it to perform various forms of data. This 1-day course will also give delegates the skills to create data analysis tasks for yourself and enhance their skills when using “R”.

The following topics are taught during this course:

  • Overview of Data Analysis
  • Business Intelligence and Analytics
  • R programming language
  • Importing Data
  • Machine Learning

Show moredown

Who should attend this Data Analysis Training Course?

This course has been developed for those who are starting to use data analysis tools. The Data Analytics with R training is ideal for those who are interested in storing and managing data.

Prerequisites

There are no prerequisites for taking this course but it is recommended that delegates have a basic understanding when using this programming language.

Data Analytics with R Course Overview

This 1-day course has been specifically designed for those who have no knowledge of the programming language and would like to expand their skills in this area. The course will help delegates gain the skills to become successful analytics professional.

What is R?

R is a programming language which is used for statistical computing and graphics. The open source tool has been used by many statisticians, data miners and data analysts to collect data to improve their products.

 

Show moredown

What's included in this Data Analysis Training Course?

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

Big Data Analysis Course Outline

This course covers the following topics:

Understanding the Fundamentals of Big Data

  • What is Big Data?
  • Sources of Big Data

 

The Big Data Analysis Lifecycle

  • Business Case Evaluation
  • Data Identification
  • Data Acquisitions and Filtering
  • Data Extraction
  • Data Validation and Cleansing
  • Data Aggregation and Representation
  • Data Analysis
  • Data Visualisation
  • Analysis Results

 

Planning a Big Data Approach

  • Bottom-Up and Top-Down Planning
  • Technologies
  • Considering Use Cases
  • Thinking Long-Term
  • Steps for Planning

 

Implementing a Big Data Approach

  • Recognising Business Challenges
  • Finding Appropriate Data Sources
  • Involving the Business
  • Choosing What to Use

 

Storing Unstructured Information

  • Apache Hadoop
  • Microsoft HDInsight
  • Hive
  • PolyBase
  • Sqoop
  • Presto
  • Microsoft Excel
  • NoSQL

 

Managing and Analysing Unstructured Information

  • Challenges of Unstructured Data
  • Deciding on a Data Source
  • Preparing for Storage
  • Choosing Storage Solutions

Show moredown

Who should attend this Big Data Training Course?

This course has been designed for those who are interesting in managing large quantities of data and creating long-term strategies for their business.

Prerequisites

There are no prerequisites for Big Data Analysis course.

Big Data Analysis Course Overview

As more and more businesses rely on data to make their decisions, the ability to critically analyse large datasets is more important than ever. Successful Big Data Analysis can provide an insight into activities and highlight opportunities to improve and expand, as well as identify issues which may prevent growth and affect profit.

Our 1-day Big Data Analysis training course provides a comprehensive introduction to this discipline, providing knowledge of the Big Data Analysis Lifecycle and how a Big Data approach can be planned and implemented.

Show moredown

What's included in this Big Data Training Course?

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Data Analysis Training using MS Excel Course Outline

Module 1: Overview of Data Analysis

  • What is Data Analysis?
  • Why Data Analysis?
  • Types of Data Analysis
  • Data Analysis Process

Module 2: Introduction to Data Analysis with MS Excel

  • Introduction to Excel Data Analysis
  • Data Cleaning
  • Data Analysis
  • Data Visualisation

Module 3: Excel Ribbon and Importing Data into Excel

  • Excel Ribbons
  • Importing Data into Excel

Module 4: Work with Range Names

  • Steps to Create Range Name
  • How to Rename Range Name?
  • How to Delete Range Name?
  • Use Name Range in Workbook

Module 5: Introduction to Tables

  • What is a Table?
  • What is the Purpose of Creating a Table?

Module 6: Cleaning Data with Text Functions

  • Removing Unwanted Characters from the Text
  • Steps for Data Cleaning

Module 7: Working with Date Formats and Time Formats

  • Steps to Change Data Format
  • Steps to Change Time Format

Module 8: Conditional Formatting in Excel

  • What is Conditional Formatting and How to Use It?
  • Apply Conditional Formatting on Text

Module 9: Sorting and Filtering Data Columns

  • What is Sorting and Filtering?
  • Sort a Particular Column
  • Applying Sorting on Two Columns
  • Steps to Sort Dates
  • Clear Filter
  • Apply Filter on Text
  • Apply Filter by Cell Icon

Module 10: Subtotals and Quick Analysis

  • Subtotals
  • Steps to Apply Subtotals
  • Quick Analysis
  • Steps to Use Quick Analysis

Module 11: Working with Multiple Sheets

  • Worksheet Tab
  • Viewing Multiple Worksheets at Once
  • Grouping Your Worksheets Together
  • Steps to Rename a Worksheet
  • Steps to Move/Copy a Worksheet
  • Steps to Delete a Worksheet

Module 12: Data Validation

  • What is Data Validation?
  • How to Use Data Validation?
  • Using Data Validation?

Module 13: Data Visualisation

  • What is Data Visualisation?
  • Using Charts in Excel
  • All Charts in Excel

Module 14: Exploring Lookup Functions

  • Lookup Function
  • VLOOKUP and HLOOKUP
  • INDEX Function
  • MATCH Function

Module 15: Pivot Tables

  • PivotTable Overview
  • Creating a PivotTable in MS Excel
  • Recommended PivotTables
  • PivotTable Fields
  • PivotTable Areas
  • Filters and Slicers
  • Summarising Values by Other Calculation
  • Using ANALYSE and DESIGN on the Ribbon

Module 16: What If Analysis

  • Introduction to What If Analysis
  • What If Analysis with Data Tables
  • What If Analysis with Scenario Manager
  • What If Analysis with Goal Seek

Show moredown

Prerequisites

There are no formal requirements for attending this Data Analysis Training using MS Excel course. However, a basic knowledge of MS Excel can be beneficial for the delegates.

Audience

This course is intended for anyone considering learning how to use Microsoft Excel for data analysis purposes.

Data Analysis Training using MS Excel Course Overview

Data analysis is the process of organising, modelling, and reconfiguring data to extract important information that can aid in business decision-making. The primary objective of data analysis is to obtain valuable insights from available data and use that meaningful information to make informed decisions. This training aims to provide a versatile and widely-used platform for data analysis in various fields such as business, finance, healthcare, and more. This training will teach individuals how to use Microsoft Excel to effectively analyse data and derive meaningful insights for informed decision-making. Possessing data analysis skills using MS Excel can benefit individuals in terms of professional growth and earning potential.

This 2-day Data Analysis Training using MS Excel will provide delegates with a comprehensive understanding of carrying out data analysis tasks with Microsoft Excel. During this training, they will learn how to remove unwanted characters from the text. They will also learn about data validation, which is the process of verifying and validating collected data before it is used. This course will be led by our highly skilled and subject-matter expert trainers, who are well versed in teaching data analysis and have sufficient experience in conducting this course to provide delegates with their desired skills.

Course Objectives

  • To get an in-depth understanding of data analysis and its process
  • To attain knowledge of how to create, rename, and delete a range name
  • To learn how to remove unwanted characters from the text
  • To become familiar with the steps for changing data and time formats
  • To attain in-depth knowledge about data validation and how to use it
  • To learn about how to create a PivotTable in MS Excel

By the end of this training, delegates will be able to use MS Excel proficiently and effectively for data analysis. They will also be able to use Excel functions and tools for data analysis purposes. They will also become proficient in the process of data visualisation in Excel and running financial analysis.

If delegates want to learn Data Analysis using other tools and techniques, they can also opt for our Big Data Architecture Training, Big Data and Hadoop Solutions Architect, Data Science Analytics, Data Analytics with R, Big Data Analysis, and many more courses from our Big Data and Analytics Training section to fulfil their needs/requirements.

Show moredown

What’s Included

  • The Knowledge Academy’s Data Analysis Training using MS Excel Manual
  • Experienced Instructor
  • Completion Certificate

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Apache Spark and Scala Training​ Course outline

Introduction of Scala

  • Introduction to Scala and Deployment of Scala for Big Data applications
  • An Overview of Apache Spark analytics

Pattern Matching

  • Importance of Scala
  • The Concept of REPL (Read Evaluate Print Loop)
  • Deep Dive into Scala Pattern Matching
  • Type Interface and Higher-Order Function
  • Currying and Traits
  • Application Space
  • Scala for Data Analysis

Executing the Scala Code

  • Introduction to Scala Interpreter
  • Static Object Timer in Scala
  • Implicit Classes in Scala and Testing String Equality in Scala
  • Understand the Concept of Currying in Scala
  • Different Classes in Scala

Classes Concept in Scala

  • Introduction to Classes concept
  • Understanding the Constructor Overloading
  • Different Abstract Classes
  • The Hierarchy Types in Scala
  • The Concept of Object Equality and Val and Var Methods in Scala

Case Classes and Pattern Matching

  • Introduction to Sealed Traits
  • Wild and Constructor
  • Tuple
  • Variable and Constant pattern

Concepts of Traits with Example

  • Introduction to Traits in Scala
  • The Advantages of Traits
  • Linearisation of Traits and The Java Equivalent
  • Avoiding of Boilerplate Code

Scala Java Interoperability

  • Implementation of Traits in Scala and Java
  • Handling of Multiple Traits Extending

Scala Collections

  • Introduction to Scala Collections
  • Classification of Collections
  • The Difference Between Iterator and Iterable in Scala
  • Example of List Sequence in Scala

Mutable Collections vs Immutable Collections

  • The Types of Collections in Scala
  • Mutable and Immutable Collections
  • Lists and Arrays in Scala
  • The List Buffer and Array Buffer
  • Queue in Scala
  • Double-Ended Queue Deque
  • Stacks and Sets
  • Maps and Tuples in Scala

Use Case Bobsrockets Package

  • What is Scala Packages and Imports
  • The Selective Imports and Test Classes
  • Introduction to JUnit test Class
  • JUnit Interface via JUnit 3 suite for Scala Test
  • Packaging of Scala Applications in Directory Structure
  • Example of Spark Split and Spark Scala

Spark Course Content

Introduction to Spark

  • What are Spark and Spark Stack
  • How Spark Overcomes the Drawbacks of working Map Reduce
  • Introduction to in-memory Map Reduce
  • Interactive Operations on Map Reduce
  • Fine vs Coarse-Grained Update
  • Spark Hadoop YARN
  • HDFS and YARN Revision
  • How it is Better Hadoop
  • Deploying Spark without Hadoop
  • Spark History Server
  • Cloudera Distribution

Spark Basics

  • Spark Installation Guide and Configuration
  • Memory Management
  • Executor Memory vs Driver Memory
  • Working with Spark Shell
  • Concept of Resilient Distributed Datasets (RDD)
  • Learning to do Functional Programming in Spark
  • The Architecture of Spark

Working with RDDs in Spark

  • Spark RDD and Creating RDDs
  • RDD Partitioning
  • Operations and Transformation in RDD
  • Deep Dive into Spark RDDs
  • The RDD General Operations
  • A Read-Only Partitioned Collection of Records
  • Using the Concept of RDD for Faster and Efficient Data Processing
  • RDD Action for Collect
  • Count and Collects Map
  • Saveastextfiles
  • Pair RDD Functions

Aggregating Data with Pair RDDs

  • Introduction to Key-Value Pair in RDDs
  • How Spark makes Map-Reduce Operations Faster
  • Different Operations of RDD
  • Map Reduce Interactive Operations
  • Fine and Coarse-Grained Update

Writing and Deploying Spark Applications

  • Comparing the Spark Applications with Spark Shell
  • Creating a Spark Application using Scala or Java
  • Deploying a Spark Application
  • Scala Built Application and Creation of Mutable List
  • Set and Set Operations
  • List and Tuple
  • Concatenating List
  • Creating an Application using SBT
  • Deploying Application using Maven
  • The Web User Interface of Spark Application
  • A Real-World Example of Spark
  • Configuring of Spark

Parallel Processing

  • Spark Parallel Processing
  • Deploying on a Cluster
  • Introduction to Spark partitions
  • File-Based Partitioning of RDDs
  • What is HDFS
  • Data Locality
  • Mastering the Technique of Parallel Operations
  • Comparing Repartition & Coalesce
  • RDD Actions

Spark RDD Persistence

  • The Execution Flow in Spark
  • RDD Persistence Overview
  • Spark Execution Flow
  • Spark Terminology
  • Distribution shared Memory vs RDD
  • RDD Limitations and RDD Lineage
  • Spark Shell Arguments and Distributed Persistence
  • Key/Value Pair for Sorting Implicit Conversion like CountByKey
  • ReduceByKey and SortByKey and AggregataeByKey

Spark Streaming & Mila

  • Spark Streaming Architecture
  • Writing Streaming Program Coding
  • Processing of Spark Stream and Processing Spark Discretised Stream (DStream)
  • The Context of Spark Streaming
  • Streaming Transformation and Flume Spark Streaming
  • Request Count and Dstream
  • Multi Batch Operation and Sliding Window Operations
  • Advanced-Data Sources and Different Algorithms
  • The Concept of the Iterative Algorithm in Spark
  • Analysing with Spark Graph Processing
  • Introduction to K-Means and Machine Learning
  • Various Variables in Spark like Shared Variables
  • Broadcast Variables and Accumulators

Spark SQL and Data Frames

  • Describe Spark SQL
  • The Context of SQL in Spark
  • Working with XML Data
  • Parquet Files
  • JSON support in Spark SQL
  • Creating a Hive Context
  • Writing Data Frame to Hive
  • Reading JDBC files
  • Introduction to Data Frames in Spark
  • Creating Data Frames
  • Manual Inferring of Schema
  • Working with CSV Files
  • Reading JDBC Tables
  • Data Frame to JDBC
  • User-Defined Functions in Spark SQL
  • Shared Variable and Accumulators
  • Understanding to Query and Transform Data in Data Frames

Improving Spark Performance

  • Introduction to various variables in Spark like Shared Variables
  • Broadcast Variables
  • Learning About Accumulators
  • The Common Performance Issues
  • Troubleshooting the Performance Problems

Scheduling or Partitioning

  • Learning about the Scheduling and Partitioning in Spark
  • Hash Partition and Range Partition
  • Scheduling within and Around Applications
  • Static Partitioning and Dynamic Sharing
  • Fair Scheduling and High Order Functions
  • Map Partition with index
  • The Zip and GroupByKey
  • Spark Master High Availability
  • Standby Masters with Zookeeper
  • Single Node Recovery with Local File System

Show moredown

Prerequisites

Delegates should have basic knowledge about Java, database, query language and SQL.

Audience

This course is designed for those who want to build their career in Big Data. It is more suitable for:

  • Senior IT Professionals
  • DWProfessionals
  • Data Scientists and Analytics Professionals
  • Developers and Architects
  • Testing Professionals
  • Software Architects
  • BI and ETL Professionals
  • Engineers and Developers
  • Mainframe Professionals

Apache Spark and Scala Training​ Course Overview

Apache Spark is an open-source and lightning-fast cluster computing system which is used for analysing a large amount of data. Spark is the most extensive tool, and many large companies have used it over the world.

This 2-day Apache Spark and Scala Certification provide delegates with a piece of in-depth knowledge and practical skills to enhance competence in Big Data Spark. During this training, delegates will get an understanding of Spark and its ecosystem, Spark Streaming, Spark SQL, RDD and Scala.

This course will cover the below different concepts:

  • Scala and its programming implementation
  • Spark Applications using Python, Java and Scala
  • Apache Spark and Hadoop
  • Spark on a cluster and Spark Streaming
  • Scala Java Interoperability and other Scala operations
  • Projects using Scala to run on Spark applications
  • Scala classes concept and executes pattern matching

This course will be delivered by the industry-experienced instructor, who will provide comprehensive knowledge on Scala Programming language, YARN, HDFS, Sqoop, Flume, Spark GraphX and Messaging System such as Kafka. After completing this training, delegates will get a certificate if they passed the exam.

Show moredown

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

Apache Storm Training Course Outline

Introduction to Apache Storm

  • Apache Storm Vs. Hadoop
  • Use-Cases

Apache Storm Concepts

  • Topology
  • Tasks
  • Workers
  • Stream Grouping

Cluster Architecture

Apache Storm Workflow

Overview of Distributed Messaging System

Installing Apache Storm

  • Verifying Java Installation
  • ZooKeeper Framework Installation
  • Apache Storm Framework Installation

Apache Storm Trident

  • Topology and Tuples
  • Spout and Operations
  • State Maintenance
  • Distributed RPC

Apache Storm Applications

Show moredown

Audience

Anyone who wishes to pursue a career in Big Data Analytics or learn to use Apache Storm can attend this course. This course is well-suited for:

  • Software Professionals
  • Mainframe and Hadoop Professionals
  • Data Scientists and ETL Developers

Prerequisites

There are no prerequisites for this course; however, understanding of Java would be advantageous.

Apache Storm Training Course Overview

Apache Storm is an open-source data streaming framework. It enables the processing of a large amount of data using a fault-tolerant and horizontal scalable method. It is simple and can be used with any programming language.

This Apache Storm Training is designed to provide knowledge of how to use Apache Storm. Delegates will learn how to install Storm and create topologies, as well as how to use its workflow, cluster architecture and distributed messaging system. The course also looks at the Apache Storm Trident, including Topology and Tuples, Spout and Operations, and State Maintenance.

Show moredown

What’s Included

  • The Knowledge Academy's Apache Storm Training Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Splunk Certification Training Course Outline

Module 1: Splunk Overview

  • Introduction to Splunk
  • Installing Splunk
  • Adding Data in Splunk

Module 2: Splunk Search Processing Language

  • Pipe Operator
  • Time Modifiers
  • Understanding Basic SPL
  • Sorting Results
  • Commands
    • Filtering
    • Reporting
  • Filtering, Modifying, and Adding Fields
  • Grouping Results

Module 3: Macros, Field Extraction, and Field Aliases

  • Field Extraction in Splunk
  • Macros
  • Field Aliases in Splunk
  • Splunk Search Query

Module 4: Tags, Lookups, and Correlating Events

  • Lookups
  • Tags
  • Reporting
  • Alerts

Module 5: Data Models, Pivot, and CIM

  • Understanding Data Models and Pivot
  • Event Actions in Splunk
  • Common Information Model in Splunk

Module 6: Knowledge Managers and Dashboards in Splunk

  • Role of a Knowledge Manager
  • Dashboards
  • Dynamic Form-Based Dashboards

Module 7: Splunk Licenses, Indexes, and Role Management

  • Buckets
  • Understanding journal. gz, .tsidx, and Bloom Filters
  • Splunk Licenses
  • Managing Splunk Licenses
  • License Pooling
  • Managing Indexes in Splunk
  • User Management

Module 8: Machine Data Using Splunk Forwarder and Clustering

  • Splunk Universal Forwarder
  • Splunk’s Light and Heavy Forwarders
  • Forwarder Management
  • Indexer Clusters
  • Lightweight Directory Access Protocol (LDAP)
  • Security Assertion Markup Language (SAML)

Module 9: Advanced Data Input in Splunk

  • Compress the Data Feed
  • Indexer Acknowledgment
  • Securing the Feed
  • Queue Size
  • Input
    • Monitor
    • Scripted
    • Network
  • Pulling Data Using Agentless Input

Module 10: Splunk’s Advanced .conf File and Diag

  • Understanding Splunk .conf Files
  • Setting Fine-Tuning Input
  • Anonymising the Data
  • Understanding Merging Logic in Splunk
  • Debugging Configuration Files
  • Creating a Diag

Module 11: Infrastructure Planning with Indexer and Search Head Clustering

  • Capacity Planning for Splunk Enterprise
  • Configuring 
    • Search Peer
    • Search Head
  • Search Head Clustering
  • Multisite Indexer Clustering
  • Splunk Architecture Practices

Module 12: Troubleshooting in Splunk

  • Monitoring Console
  • Log Files for Troubleshooting
  • Metrics.log File
  • Job Inspector
  • Troubleshooting
    • License Violations
    • Deployment Issues
    • Clustering Issues

Module 13: Advanced Deployment

  • Deploying Apps Through the Deployment Server
  • Creating a Server Group Using ServerClass.conf
  • Deploy Configuration File Through Cluster Master
  • Deploy App on Search Head Clustering
  • Load Balancing
  • Indexer Discovery
  • SOCKS Proxy

Module 14: Advanced Splunk

  • Managing Indexes
  • Manage Index Storage
  • Managing
    • Index Cluster
    • Multisite Index Cluster
  • REST API Endpoints
  • Splunk SDK

Show moredown

Prerequisites

There are no formal prerequisites for attending this Splunk Training course. However, a prior understanding of storing and retrieving data would be highly beneficial.

Audience

This course is designed by The Knowledge Academy for everyone who wants to grasp the essentials of Splunk. However, it will much more beneficial for:

  • IT Professionals
  • IT Infrastructure Management Professionals

Splunk Certification Training​ Course Overview

Splunk is a popular software to monitor, search, analyse, and visualise machine-generated data in real-time. It captures, indexes, and correlates real-time data within a searchable container, generating alerts, graphs, visualisations, and dashboards. Splunk helps the users to gather, store, and deliver extensive analytical skills, allowing enterprises to act on the data's frequently profound insights. Studying this training will provide learners with the appropriate use of Splunk and enable them to discover events using search processing language. It helps in monitoring business metrics, making informed decisions, and creating a central repository for searching. Individuals with strong searching and analysis skills will grab well-paying jobs in multinational corporations, where they will be able to use their Splunk expertise in day-to-day real-time activities.

In this 2-day Splunk Training course, delegates will enhance their expertise in how Splunk can be used to analyse and respond to issues in their businesses using operational intelligence. Delegates will learn about the basics of the Search Processing Language (SPL), which includes Boolean operators, syntax colouring, search language syntax, and search modes. They will also learn about macros that run the search command to avoid rewriting the whole command and how to create it using .conf web and Splunk web. Our highly professional and skilled instructor with years of teaching experience will conduct this course and assist delegates with the fundamentals to advanced concepts of Splunk.

Course Objectives

  • To install Splunk on different platforms like macOS and Windows
  • To learn about relative-search and real-time search time modifiers
  • To acquire an understanding of filtering and reporting commands
  • To execute a chain of search commands using the pipe operator
  • To understand the use of data models and pivot in Splunk
  • To get familiar with the privileges that a user has within Splunk

After attending this training course, delegates will be able to create data models and recognise the patterns of product sales requests. They will also be able to enhance the GUI and real-time visibility in a dashboard to deliver the most up-to-date data on a wide range of performance metrics.

Show moredown

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor

Show moredown

Online Instructor-led (4 days)

Online Self-paced (32 hours)

Advanced Data Analytics Certification​ Course Outline

Domain 1: Data Analytics

Module 1: Introduction to Data Analytics

  • Data Analytics Overview
  • Importance of Data Analytics
  • Types of Data Analytics
    • Descriptive Analytics
    • Diagnostic Analytics
    • Predictive Analytics
    • Prescriptive Analytics
  • Benefits of Data Analytics
  • Data Visualisation for Decision Making 
  • Data Types, Measure of Central Tendency, Measures of Dispersion
  • Graphical Techniques, Skewness and Kurtosis, Box Plot
  • Descriptive Stats
  • Sampling Variation, Central Limit Theorem, Confidence Interval
  • Optimisation Techniques for Data Analytics

Module 2: Introduction to Statistical Analysis

  • Counting, Probability, and Probability Distributions
  • Sampling Distributions
  • Estimation and Hypothesis Testing
  • Scatter Diagram
  • ANOVA and Chi-Square
  • Imputation Techniques
  • Data Cleaning
  • Correlation and Regression

Module 3: Data Wrangling with SQL

  • Introduction to SQL
  • Database Normalisation
  • Entity-Relationship Model
  • SQL Operators
  • Join, Tables, and Variables
  • SQL Functions
  • Subqueries
  • Views and Stored Procedures
  • User-Defined Functions
  • SQL Performance and Optimisation
  • Advanced Concepts
    • Correlated Subquery
    • Grouping Sets

Module 4: Presto

  • Introduction to Presto
  • Writing Queries in Presto on Large Data Sets
  • Data Transformation Using Presto

Module 5: Feature Engineering

  • Handling Unstructured Data
  • Machine Learning Algorithms
  • Bias Variance Trade-Off
  • Handling Unbalanced Data
  • Boosting
  • Model Validation
  • Hyper Parameter Optimisation
  • Advanced Machine Learning Libraries – Xgboost
  • Solving Problems on Kaggle

Capstone 1: Detect Credit Card Fraud Using Machine Learning

Domain 2: Business Analytics with Excel

Module 1: Introduction to Data Analysis with MS Excel

  • Steps to Analyse Data
  • Introduction to Tables

Module 2: Cleaning Data with Text Functions

  • Removing Unwanted Characters from the Text
    • Steps for Data Cleaning

Module 3: Sorting and Filtering

  • What is Sorting and Filtering?
  • Sort a Particular Column
  • Applying Sorting on Two Columns
  • Steps to Sort Dates and Columns by Colours
  • Apply Filtering
  • Clear Filter
  • Apply Filter on Text

Module 4: Exploring Lookup Functions

  • VLookUp Functions in Excel
  • HLookUp Functions in Excel

Module 5: Introduction to Power Pivot and Formula Auditing

  • Working with Pivot Tables
  • How to Use Power Pivot?
  • Measures
  • Dimension Tables
  • Calculated Columns
  • Relationships
  • Advanced Functions
  • Data Visualisation and Analysis
  • Show Formulas
  • Trace Precedents
  • Remove Arrows
  • Trace Dependents
  • Evaluate Formula

Module 6: DAX Variables and Formatting

  • What is DAX?
  • Data Types and Operators
  • DAX Variables
  • Formatting DAX Code
  • Debugging Errors in DAX Code
  • Progressive DAX Syntax and Functions

Module 7: Introduction to Power Map

  • Create a Power Map
  • Explore Sample Datasets in Power Map
  • Visualise Data in Power Map
  • Create a Custom Map in Power Map

Module 8: Design a Dashboard Using Data Model

  • How to Design the Dashboard?
  • Using PowerPoint and Excel
  • Make a Dashboard in Excel
  • Customise with Macros, Colour, etc.
  • Make a Dashboard in Smartsheet

Capstone 2: Ecommerce Sales Dashboard in Excel

Domain 3: Programming Basics and Data Analytics with Python

Module 1: Python for Data Analysis - NumPy

  • Introduction to NumPy
  • NumPy Arrays
  • Aggregations
  • Computation on Arrays: Broadcasting
  • Comparison, Boolean Logic and Masks
  • Fancy Indexing
  • Sorting Arrays
  • NumPy’s Structured Arrays

Module 2: Python for Data Analysis – Pandas

  • Installing Pandas
  • Pandas Objects
  • Data Indexing and Selection
  • Operating on Data in Pandas
  • Handling Missing Data
  • Hierarchical Indexing
  • Concat and Append
  • Merge and Join
  • Aggregations and Grouping
  • Pivot Tables
  • Vectorised String Operations
  • Working with Time Series

Module 3: Python for Data Visualisation – Matplotlib

  • Overview
  • Object-Oriented Interface
  • Two Interfaces
  • Simple Line Plots and Scatter Plots
  • Visualising Errors
  • Contour Plots
  • Histograms, Binnings, and Density
  • Customising Plot Legends
  • Customising Colour Bars
  • Multiple Subplots
  • Text Annotation
  • Three-Dimensional Plotting

Module 4: Python for Data Visualisation – Seaborn

  • Installing Seaborn and Load Dataset
  • Plot the Distribution
  • Regression Analysis
  • Basic Aesthetic Themes and Styles
  • Distinguish Between Scatter Plots, Hexbin Plots, and KDE Plots
  • Use Boxplots and Violin Plots

Capstone 3: Exploratory Data Analysis Using Python

Domain 4: Tableau Training

Module 1: Get Started

  • What is Tableau?
  • Steps in Creating Tableau Data Analysis Report
  • Navigation
  • Design Flow
  • File Types
  • Data Types
  • Show Me
  • Data Terminology

Module 2: Data Sources

  • Types of Data Sources
  • Custom Data View
  • Extracting Data
  • Fields Operations
  • Editing Metadata
  • Data Joining
  • Data Blending

Module 3: Worksheets

  • Add and Rename
  • Save and Delete
  • Reorder Worksheet
  • Paged Workbook

Module 4: Calculations

  • Operators
  • Functions
  • Calculations
    • Numeric
    • String
    • Date
    • Table
  • LOD Expressions

Module 5: Sort and Filters

  • Basic Sorting
  • Basic Filters
  • Filters
    • Quick
    • Context
    • Condition
  • Top Filters
  • Filter Operations

Module 6: Tableau Charts

  • Chart
    • Bar
    • Line
    • Pie
  • Crosstab
  • Scatter Plot
  • Bubble Chart
  • Bullet Graph
  • Box Plot
  • Tree Map
  • Bump Chart
  • Gantt Chart
  • Histogram
  • Motion Charts
  • Waterfall Charts

Capstone 4: Data Visualisation with Tableau

Show moredown

Prerequisites

There are no formal prerequisites for attending this Advanced Data Analytics Certification.

Audience

This training certification is curated by The Knowledge Academy for everyone who wants to equip themselves with the knowledge of Data Analytics and intermediates to take themselves to the next level. However, this course will be beneficial for:

  • IT Professionals
  • Marketing Managers
  • Sales Professionals
  • Supply Chain Network Managers
  • Beginners in Data Analytics Domains

Advanced Data Analytics Certification​ Course Overview

Data Analytics is the scientific discipline that analyse raw data and uses that information to draw conclusions. The process has been automated into algorithms and mechanical processes that work with raw data for human consumption. It is critical to optimise business performance and implement business models that can reduce costs by storing massive amounts of data and identifying efficient business practices. A Data Analyst is responsible for using automated tools, maintaining databases, and preparing final analysis reports. This training assists aspiring candidates with the essential skills of engineering and interpreting data in Python. Individuals with these skills and knowledge will get higher designations at multinational corporations and ultimately climb up the ladders of success.

In this 4-day Advanced Data Analytics Certification, delegates will gain a comprehensive knowledge of data analytics using various programming languages and will learn techniques to expert those languages. During this certification, delegates will learn about business analytics with Excel to gain business perceptions and brush up on their Microsoft Excel skills. They will also learn about programming basics and analytics with Python using the libraries NumPy, Pandas, Matplotlib, and Seaborn. Our highly skilled trainer with years of teaching experience will conduct this training certification and equip delegates with advanced data analysis and statistical methods.

Course Objectives

  • To learn about data visualisation that represents data graphically  
  • To understand data wrangling for unifying and cleaning messy data
  • To become familiar with the importance of sorting and filtering
  • To design dashboard using data model that store information systematically
  • To analyse data with Python using NumPy and Pandas libraries
  • To know about Tableau charts that illustrates trends and outliers

After attending this training certification, delegates will be able to analyse and visualise the information by extracting it from raw data. They will also be able to create capstone projects such as credit card fraud detection using Machine Learning, an e-commerce sales dashboard in Excel, etc., to gain expertise in the Data Analytics field.

Show moredown

  • Delegate pack consisting of course notes and exercises
  • Courseware
  • Experienced Instructor

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

Couchbase Training Course Outline

Introduction to Couchbase Server

Installing Couchbase Server

  • Estimate Cluster Size Requirements
  • Network Ports
  • Setting Couchbase Server

Couchbase Administration Console Basics

  • Clusters, Buckets and Servers
  • Create and Edit Data Buckets
  • Couchbase Server States

Developing with Couchbase

  • Deployment Options
  • Basic Operations
  • Storing Data
  • Client Interaction with the Cluster

Cluster Monitoring

  • Monitoring Nodes and Buckets
  • Monitoring Data Buckets
  • Monitoring Server Nodes

Managing Cluster

  • Adding Node
  • Removing Node
  • Rebalancing
  • Failover with Couchbase
  • Backup and Restore

Show moredown

Audience

Anyone who wishes to gain knowledge on Couchbase can attend this course. This course is well-suited for:

  • Software Developers
  • System Administrators
  • Database and Analytics Professionals

Prerequisites

There are no prerequisites for this course. 

Couchbase Training Course Overview

Couchbase Server is a distributed and scalable NoSQL document database, designed to allow the execution of fast create, store, update, and retrieval operations.

This Couchbase training course is designed to provide knowledge on the working of Couchbase Server, including installation and how to use the Administrative Console. The course also looks at how to develop for Couchbase and monitor and manage clusters.

Show moredown

What’s Included

  • The Knowledge Academy's Couchbase Training Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Apache Kafka Training Course Outline

Introduction to Big Data

Overview of Kafka

  • Publish/Subscribe Messaging
  • Enter Kafka
  • The Data Ecosystem

Installing Kafka

  • Installing Java and Zookeeper
  • Installing Kafka Broker
  • Broker Configuration
  • Hardware Selection
  • Kafka Clusters

Kafka Producers

  • Creating a Kafka Producer
  • Sending Message to Kafka
  • Configuring Producers
  • Serializers
  • Partitions

Kafka Consumers

  • Create Kafka Consumer
  • Pool Loop
  • Configuring Consumers
  • Commits and Offsets
  • Rebalance Listeners
  • Deserializers

Kafka Internals

Reliable Data Delivery

  • Reliability Guarantees
  • Replication
  • Broker Configuration
  • Using Producers and Consumers in a Reliable System

Building Data Pipelines

Cross-Cluster Data Mirroring

  • Use Cases of Cross-Cluster Mirroring
  • Multicluster Architectures
  • Apache Kafka’s MirrorMaker

Administering and Monitoring Kafka

Stream Processing

  • Stream-Processing Concepts
  • Stream-Processing Design Patterns
  • Kafka Streams: Architecture Overview

Show moredown

Audience

Anyone who wishes to learn how to use Apache Kafka can attend this course. This course is ideal for:

  • Big Data Architects
  • Analytics and Research Professionals
  • Messaging and Queuing System Professionals
  • Developers who are looking to build a streaming data application

Prerequisites

There are no prerequisites for this course. However, knowledge of basic Java Programming would be beneficial.

Apache Kafka Training Course Overview

Apache Kafka is a high-performance real-time messaging system and open-source stream-processing platform that can process millions of messages per second. It is suitable for both online and offline message consumption. Apache Kafka integrates with Apache Storm and Spark for real-time streaming data analysis, minimising down time and data loss.

This Apache Kafka Training Course is designed to help delegates to acquire skills to become a Kafka Big Data Developer. During this two-day comprehensive course, delegates will learn the skills required to administer and monitor Kafka, including how to take control of a Kafka cluster by configuring Kafka Producers, Consumers and streams. Delegates will also learn how to build data pipelines and applications with Kafka, as well as how to install Java, Zookeeper and Kafka Broker.

Show moredown

What’s Included

  • The Knowledge Academy's Apache Kafka Training Course Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Apache Spark Training Course Outline

Introduction to Apache Spark

  • Cluster Design
  • Cluster Management
  • Performance

Apache Spark MLlib

  • Environment Configuration
  • Classification with Naive Bayes
  • Clustering with K-Means
  • Artificial Neural Networks (ANN)

Apache Spark Streaming

  • Errors and Recovery
  • TCP Stream
  • Apache Flume
  • Apache Kafka

Apache Spark SQL

  • SQL Context
  • Importing and Saving Data
  • DataFrames
  • Using SQL
  • User-defined Functions
  • Using Hive

Apache Spark GraphX

  • Environment
  • Creating a Graph
  • Installing Docker
  • Neo4j Browser
  • Mazerunner Algorithms

Graph-Based Storage

  • Overview of Titan and TinkerPop
  • Installing Titan
  • Titan with HBase
  • Titan with Cassandra
  • Accessing Titan with Spark

Spark Databricks

  • Installing Databricks
  • Databricks Menus
  • Account and Cluster Management
  • Notebooks and Folders
  • Jobs and Libraries
  • Databricks Tables
  • DbUtils Package

Databricks Visualisation

  • Data Visualisation
  • REST Interface
  • Moving Data

Show moredown

Audience

Anyone who wishes to enhance their knowledge of Apache Spark can attend this course. This course is ideal for:

  • Architects and Developers
  • Analytics and Research Professionals
  • BI and IT Professionals
  • Mainframe and Testing Professionals

Prerequisites

There are no prerequisites for this course. However, basic knowledge of SQL, databases, and query language will be beneficial.

Apache Spark Training Course Overview

Apache Spark is a framework for large-scale SQL, stream processing, batch processing, and machine learning. Its main feature is in-memory cluster computing, which enhances its processing speed. It can also handle both batch and real-time analytics and data processing workloads, as well as process data from different data repositories including NoSQL databases, the Hadoop Distributed File System (HDFS) and more.

This Apache Spark training course is designed to provide delegates with the skills and knowledge to become a successful Big Data and Spark Developer. The two-day course explores concepts including cluster design, cluster management and artificial neural networks, as well as how to install Docker, Titan and Databricks. It also looks at how to process graphs using GraphX.

Show moredown

What’s Included

  • The Knowledge Academy's Apache Spark Training Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Big Data Analytics & Data Science Integration Course Outline

Module 1: Big Data Analytics - Introduction

  • Big Data Overview
  • State of Practice in Analytics
  • Main Roles for New Big Data Ecosystem

Module 2: Data Analytics Lifecycle

  • Overview of Data Analytics Lifecycle
  • Phase 1 – Discovery
  • Phase 2 – Data Preparation
  • Phase 3 – Model Planning
  • Phase 4 – Model Building
  • Phase 5 – Communicate Results
  • Phase 6 - Operationalise

Module 3: Basic Data Analytic Methods Using R

  • Introduction to R
  • Exploratory Data Analysis
  • Statistical Methods for Evaluation

Module 4: Introduction to Clustering

Module 5: Association Rules

  • Apriori Algorithm
  • Evaluation of Candidate Rules
  • Applications of Association Rules
  • Validation and Testing
  • Diagnostics

Module 6: Regression

  • Linear Regression
  • Logistic Regression

Module 7: Classification

  • Decision Trees
  • Naïve Bayes
  • Diagnostics of Classifiers

Module 8: Time Series Analysis

Module 9: Text Analysis

  • Steps of Text Analysis
  • Collecting Raw Text
  • Representing Text
  • Term Frequency – Inverse Document Frequency (TFIDF)

Module 10: MapReduce and Hadoop

  • Analytics for Unstructured Data
  • The Hadoop Ecosystem
  • NoSQL

Module 11: In-Database Analytics

  • SQL Essentials
  • In-Database Text Analysis
  • Advanced SQL

Show moredown

Audience

Anybody wishing to pursue a career in Big Data and Data Science can attend this course. This course is well-suited for:

  • Business Analyst
  • Data Analysts
  • Database Professionals
  • Business Intelligence Managers
  • Graduates who wish to build a career in data science

Prerequisites

No prerequisites are required for this course.

Big Data Analytics & Data Science Integration Course Overview

Data Science is a combination of programming, analytical, and business skills that enable the review, analysis and extraction of meaningful insights from raw data. Big Data creates new opportunities for organisations to derive insights and generate competitive advantage from information.

This Big Data Analytics and Data Science Integration course will help delegates to gain expertise in using Big Data and Data Science related technologies. It provides delegates with in-depth knowledge of how to design, develop and deploy data science and big data applications in the real world. Topics covered include the Data Analytics Lifecycle, Regression, Classification, Text Analysis and Database Analytics.

Show moredown

What's Included

Anybody wishing to pursue a career in Big Data and Data Science can attend this course. This course is well-suited for:

  • Business Analyst
  • Data Analysts
  • Database Professionals
  • Business Intelligence Managers
  • Graduates who wish to build a career in data science

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Data Integration and Big Data using Talend Course Outline

Introduction to Data Integration

Introduction to Talend Big Data Solutions

Working with Projects

  • Introduction to Projects
  • Creating a Project
  • Importing a Project
  • Opening a Project
  • Deleting a Project
  • Exporting a Project

Designing a Business Model

  • Introduction to Business Model
  • Creating a Business Model
  • Modeling a Business Model
  • Editing and Saving a Business Model

Hive in Talend

Designing a Job

Managing Jobs

  • Activating/Deactivating a Subjob
  • Importing/Exporting Items and Building Jobs
  • Managing Repository Items
  • Documenting a Job
  • Handling Job Execution

Handling Jobs

Mapping Data Flows

  • Map Editor Interfaces
  • tMap Operation
  • tXMLMap Operation

Mapping Big Data Flows

  • tPigMap Interface
  • tPigMap Operation

Managing Metadata for Data Integration

Managing Metadata for Talend Big Data

  • Managing NoSQL Metadata
  • Managing Hadoop Metadata

Managing Routines

Using SQL Templates

  • Introduction to ELT
  • Overview of Talend SQL Templates
  • Managing Talend SQL Templates

Show moredown

Who should attend?

Anyone who wishes to use Talend for data integration can attend this course. This course is ideal for:

  • Data Warehousing Professionals
  • Data Scientists and Architects
  • System Administrators and Integrators
  • Business Analysts

 

Prerequisites

There are no prerequisites for this course. However, basic knowledge of Data Warehousing and SQL would be beneficial.

Data Integration and Big Data using Talend Course Overview

Talend is an open-source data integration platform. It combines data from multiple sources and ensures it can be moved quickly across to target systems to provide greater business insights. It offers various software and services for enterprise application integration, data management, data integration, cloud storage, data quality, and Big Data.

This Data Integration and Big Data using Talend course is designed to provide thorough knowledge of how to use Talend to address Big Data Integration and management challenges. The course will cover how to design and manage jobs, as well as how to create, import, open, delete and export projects. You will also learn how to manage metadata and use Talend SQL templates.

Show moredown

What’s Included

  • The Knowledge Academy's Data Integration and Big Data using Talend Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

Data Warehousing Training Course Outline

Introduction to Data Warehouse

  • What is Data Warehousing?
  • Features of Data Warehouse
  • Types of Data Warehouse
  • Components of Data Warehouse
  • Use of Data Warehouse
  • Advantages and Disadvantages
  • Data Warehouse Tools
  • Data Warehouse Applications
  • Integrating Heterogeneous Databases

Terminologies

  • Metadata
  • Metadata Repository
  • Data Cube
  • Data Mart
  • Virtual Warehouse

Dimensions and Facts

Modelling

  • ER Diagram

Delivery Process

  • Delivery Method
  • IT Strategy
  • Education and Prototyping
  • Technical Blueprint

System Processes

  • Process Flow in Data Warehouse
  • Extract and Load Process
  • Clean and Transform Process
  • Backup and Archive the Data

Data Warehouse Architecture

  • Three-Tier Data Warehouse Architecture
  • Data Warehouse Models
  • Load, Warehouse, and Query Manager

Data Warehouse OLAP

  • Types of OLAP Servers
  • OLAP Operations
  • OLAP vs OLTP

Relational and Multidimensional OLAP

Data Warehouse Schemas

  • Star Schema
  • Snowflake Schema
  • Fact Constellation Schema
  • Schema Definition

Horizontal and Vertical Partitioning

Metadata Concepts

  • Metadata Categories
  • Role of Metadata
  • Metadata Repository

Introduction to Data Marting

System and Process Managers

Security and Backup

  • Security Requirements
  • User Access
  • Impact of Security on Design
  • Hardware and Software Backup

Tuning and Testing

Show moredown

Who should attend?

Anybody wishing to gain good knowledge of data warehousing. This course is best suited for:

  • Recent Graduates
  • IT Professionals who wish to learn about data storage and data warehousing modelling
  • Finance professionals  

 

Prerequisites

There are no formal prerequisites for this course. However, an understanding of basic database concepts would be beneficial.

Data Warehousing Training Course Overview

A data warehouse is a database that collects and stores a large amount of data from a diverse range of sources to allow analysis and provide insights.

This Data Warehouse training course will introduce delegates to the fundamental concepts of data warehousing, including architecture, modelling, delivery and system processes. Delegates will learn about the different terminology related to data warehousing, metadata concepts, schemas, and security.

Show moredown

What’s Included

  • The Knowledge Academy’s Data Warehousing Training Manual
  • Experienced Instructor
  • Completion Certificate

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

ELK Stack Training Outline

Introduction to ELK Stack

  • ELK Stack Architecture
  • Importance of ELK
  • Elasticsearch
  • Logstash
  • Kibana
  • ELK vs Splunk
  • Advantages and Disadvantages of ELK Stack

Installing ELK

  • Environment Specifications
  • Java and Elasticsearch Installation
  • Logstash, Kibana and Beats Installation

 Elasticsearch

  • Basic Concepts – Documents, Types, Mapping, Shards and Index
  • Queries – Boolean Operators, Fields, Ranges and URI Search
  • REST API
  • Plugins

Logstash

  • Configuration
  • Pitfalls

Kibana

  • Kibana Searches
  • Visualisations
  • Dashboards
  • Kibana Elasticsearch Index

Beats

  • Configuration         
  • Modules

ELK in Production

  • Monitor Logstash/Elasticsearch Exceptions
  • ELK Elasticity
  • Security
  • Maintainability
  • Upgrades

Use Cases

Show moredown

Who should attend?

Anybody who wishes to learn how to use the ELK stack can attend this course. Job titles this course is recommended for:

  • System Log Analyst
  • Full Stack Technical Analyst
  • Business Analyst
  • Big Data Analytics Engineer – Elastic Search

 

Prerequisites

 A basic understanding of JSON Data Format, SQL and Restful API will be helpful.​

ELK Stack Training Overview

The ELK Stack is a combination of three open-source products - Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server-side data processing pipeline that inputs data from various sources at the same time, transforms it and sends it to a stash. Kibana enables users to visualise data with graphs and charts in Elasticsearch.

This ELK Stack Training course will provide delegates with a good understanding of Elasticsearch, Logstash and Kibana. Delegates will learn about Elasticsearch queries such as Boolean Operators, Fields, Ranges and URI Search. They will also gain knowledge on ELK elasticity and use cases. By the end of the course, you will understand how to use Elasticsearch, Logstash and Kibana, and how it can be used in business.

Show moredown

What’s Included

  • The Knowledge Academy’s ELK Stack Training Course Manual
  • Experienced Instructor
  • Completion Certificate

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Hadoop Training Course with Impala Outline

Introduction to Apache Impala

  • Benefits of Impala
  • Working of Impala with CDH

Concepts and Architecture

  • Impala Components
  • Developing Impala Applications
  • Impala in Hadoop Ecosystem

Planning Impala Deployment

Installing Impala

  • Installation with Cloudera Manager
  • Installation without Cloudera Manager

Managing and Upgrading Impala

Starting Impala

  • Starting through Cloudera Manager
  • Starting from Command Line
  • Modifying Impala Startup Options

Impala Administration

Impala Security

Impala SQL Language Reference

Using Impala Shell Command

Tuning Impala for Performance

Scalability Considerations for Impala

Partitioning for Impala Tables

Working of Impala with Hadoop File Formats

Use Impala to Query HBase Tables

Using Impala Logging

Troubleshooting Impala

Show moredown

Who should attend?

Anyone who wishes to gain expertise in Impala can attend this course. This course is ideal for:

  • Analysts and Data Scientists
  • SQL Developers
  • Hadoop Administrators and Developers
  • Database Administrators
  • Data Warehouse Developers

 

Prerequisites

No prerequisites are required for this course. However, basic knowledge of the principles of programming is advantageous.

 

Hadoop Training Course with Impala Overview

Impala is a distributed massive parallel processing SQL query engine for processing enormous data volumes stored in a Hadoop cluster. Impala is licensed by Apache, and it runs on the open-source Apache Hadoop big data analytics platform.

This Hadoop Training Course with Impala is designed to equip delegates with comprehensive knowledge regarding Apache Impala. Delegates will learn how to install, manage and upgrade Impala as well as how to start Impala through Cloudera Manager and the command line. From here, the course will show you how to administer Impala, including managing security, tuning for performance, and troubleshooting.

Show moredown

What’s Included

  • The Knowledge Academy's Hadoop Training Course with Impala Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

HBase Training Course Outline

Introduction to HBase

  • HBase and HDFS
  • Storage Mechanism
  • HBase and RDMS
  • Application of HBase
  • HBase Architecture

Installing HBase

Overview of HBase Shell

  • General Commands
  • Data Definition Language
  • Data Manipulation Language

HBase Admin API

Basics of HBase Tables

  • Creating Tables
  • Listing Tables
  • Disabling a Table
  • Enabling a Table
  • Dropping a Table

HBase Describe and Alter

HBase Exists and Shutting Down

Client API Basics

Overview of HBase Data

  • Create Data
  • Update Data
  • Read Data
  • Delete Data

HBase Scan and Security Basics

Show moredown

Who should attend?

Anyone who wishes to pursue a career in Big Data can attend this course. This course is beneficial for the following professionals:

  • Software Professionals
  • ETL Developers
  • Big Data Analysts and Testing Professionals

 

Prerequisites

There are no prerequisites for this course. However, knowledge of Hadoop architecture and APIs would be beneficial.

HBase Training Course Overview

HBase is a non-relational database providing real-time read and write access to large datasets. It allows the storing of a huge amount of data in the form of a table. It scales linearly for handling huge datasets and combining data sources with different structures and schemas. It is natively integrated with Hadoop and works with other data access engines seamlessly through YARN.

This HBase Training is designed to provide thorough knowledge of HBase, including procedures to set up HBase on Hadoop file systems. Delegates will understand the different ways to interact with HBase Shell, how to connect to HBase with the help of Java, and how basic operations are performed on HBase by using Java. You will also become familiarised with HBase tables and perform various operations on those tables.

Show moredown

What’s Included

  • The Knowledge Academy's HBase Training Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Informatica Powercenter Training Course Outline

  • Introduction to Informatica
  • Informatica Architecture
  • Installing Informatica PowerCenter
  • Configuring Clients and Repositories
    • Overview of Informatica Domain
    • Opening the Administrator Home Page
    • Creating Repository Services and Contents
    • Configuring Client and Domain
    • Creating User
  • Source Analyser and Target Designer
    • Opening a Source Analyser
    • Importing a Source Table in Source Analyser
    • Opening a Target Designer and Importing Target in Target Designer
    • Creating a Folder
  • Overview of Mappings
    • Components of Mapping
    • Create a Mapping
    • Mapping Parameters and Variables
  • Workflow and Workflow Monitor
  • Debug Mappings
  • Introduction to Transformations
    • Classification of Transformation
    • Filter Transformation
  • Source Qualifier Transformation
  • Aggregator Transformation
  • Router Transformation
  • Joiner Transformation
  • Rank Transformation
  • Sequence Generator Transformation
  • Transaction Control Transformation
  • Lookup and Re-usable Transformation
  • Normaliser Transformation
  • Performance Tuning for Transformation

Show moredown

Who should attend?

Anyone who wishes to elevate their knowledge regarding Informatica can attend this course. This course is ideal for:

  • Informatica PowerCenter Administrators
  • Software and Mainframe Developers
  • Analytics Professionals
  • Project Managers

 

Prerequisites

There are no prerequisites for this course. However, basic knowledge of SQL will be beneficial.

Informatica Powercenter Training Course Overview

Informatica PowerCenter is an enterprise ETL (extract, transform, and load) tool used to build enterprise data warehouses. It is used to extract data, transform it as per business needs and then load the data into a target data warehouse. It offers a wide range of features such as integration of data from multiple systems, operations at row level on data, or scheduling of data operations.

This comprehensive course is specifically designed to provide knowledge of Informatica PowerCenter and its architecture. As well as installing PowerCenter, it covers the configuration of clients and repositories, workflow, target designer, and debugging.

Show moredown

What’s Included

  • The Knowledge Academy's Informatica PowerCenter Training Course Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Spark Training for Python Developers Course Outline

Module 1: Set up a Spark Virtual Environment

  • Data-intensive Applications Architecture
  • Overview of Spark
  • Introduction to Anaconda
  • Setting a Spark Powered Environment
  • Building App with PySpark
  • Virtualising the Environment with Vagrant
  • Moving to the Cloud

 

Module 2: Building Batch and Streaming Apps with Spark

  • Architecting Data-intensive Apps
  • Analysing Data
  • Exploring GitHub

 

Module 3: Juggling Data with Spark

  • Serialising and Deserialising Data
  • Harvesting and Storing Data
  • Exploring Data using Blaze
  • Exploring Data using Spark SQL

 

Module 4: Data Using Spark

  • Classifying Spark MLlib Algorithms
  • Spark MLlib Data Types
  • Machine Learning Workflows and Dataflows
  • Clustering Twitter Dataset
  • Build Machine Learning Pipelines

 

Module 5: Streaming Live Data with Spark

  • Streaming Architecture
  • Process Live Data with TCP Sockets
  • Build a Reliable and Scalable Streaming App
  • Lambda and Kappa Architecture

 

Module 6: Visualising Insights and Trends

  • Preprocess Data for Visualisation
  • Setting and Creating Wordclouds
  • Geo-locating Tweets and Mapping Meetups

Show moredown

Who should attend?

This course is intended for:

  • Architects and Developers
  • Business Intelligence and Mainframe Professionals
  • Big Data Architects
  • Data Scientists and Analytics Professionals

Prerequisites

There are no formal prerequisites for this course. However, basic knowledge of SQL and Python programming would be beneficial.

Spark Training for Python Developers Course Overview

Apache Spark is an analytics engine for the processing of big data. It can carry out the processing of large-scale SQL, stream processing, batch processing, and machine learning. Spark’s main feature is its in-memory cluster computing which enhances application processing speed. It can handle both batch and real-time analytics and data processing workloads.

This Spark Training for Python Developers course is designed to provide knowledge of how to set up a virtual Spark environment. Delegates will learn how to install Spark and the Python Anaconda distribution, build batch and streaming apps using Spark, and explore data by using Blaze and Spark SQL.

Other topics covered include how to pre-process data for visualisation and how to create Wordclouds. By the completion of this course, you will be able to build a reliable and scalable streaming app.

Show moredown

What’s Included

  • The Knowledge Academy's Spark Training for Python Developers Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (1 days)

Online Self-paced (8 hours)

Apache ORC Training​ Course Outline

Introduction to Apache ORC

  • What is Apache ORC?
  • ORC Adapters
  • ORC Types
  • Level of Indexes
  • ACID Support

Building ORC

  • Building both C++ and Java
  • Building Java
  • Building C++

Using Apache ORC in Hive

  • Hive DDL
  • Hive Configuration
    • Table Properties
    • Configuration Properties

Using Apache ORC in MapReduce

  • Reading ORC Files
  • Writing ORC Files
  • Sending OrcStruct, OrcList, OrcMap, or OrcUnion through the Shuffle

Using ORC Core

  • Core Java
  • Core C++

Apache ORC Tools

  • C++ Tools
    • orc-contents
    • orc-metadata
    • csv-import
    • orc-scan
    • orc-statistics
  • Java Tools
    • Java Meta
    • Java Data and Scan
    • Java Convert
    • Java JSON Schema      

Show moredown

Prerequisites

There are no formal prerequisites for attending this course.

Audience

Anyone who wishes to learn about Apache ORC can attend this course.

Apache ORC Training​ Course Overview

Apache is a non-profit organisation that helps those open-source software projects that are released under the license of Apache. Apache ORC is a self-describing columnar file format enabling efficient querying and storage of data on Hadoop. It uses multi-version concurrency control for supporting ACID transactions. This Apache ORC Training is designed to equip delegates with a detailed knowledge of Apache ORC.

The Knowledge Academy’s Apache OCR Training will introduce delegates to ORC adapters and types. Delegates will gain knowledge of Apache ORC’s three levels of indexes. In addition, delegates will learn how to build Apache ORC. Delegates will get familiarised with hive DDL and configuration, including table and configuration properties.

During this 1-day course, delegates will learn how to read and write ORC files. Delegates will get an understanding of how to send OrcStruct, OrcList, OrcMap through the shuffle. This Apache ORC Training will fully prepare delegates on how to use Apache ORC tools – C++ and Java tools. Post completion of this training, delegates will be able to use        Java meta, data, scan, convert, and JSON Schema.

Show moredown

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor

Show moredown

Online Instructor-led (2 days)

Online Self-paced (16 hours)

Apache Maven Training Course Outline

Module 1: Introduction to Apache Maven

  • Installing Apache Maven
  • Understanding the Maven Repository and Lifecycle
  • Understanding the Role of Plugins

Module 2: Dependencies

  • Overview of Maven Dependencies
  • Controlling Maven classpaths
  • Maven and Transitive Dependencies
  • Managing Dependencies

Module 3: Plugins

  • What are Maven Plugins?
  • Adding Steps to a Maven Build
  • Code Generation
  • Managing Plugins with a Parent POM
  • Finding Available Plugins

Module 4: Controlling the Build

  • Maven Build Properties
  • Maven Profile
  • Profile Activation via Properties and Environment
  • User Settings, Profiles and Repositories

Module 5: The Project Website

  • The Basic Website and Reports
  • Using Report Plugins
  • Creating Custom Pages
  • Deploying to a Web Server

Module 6: The Maven Release Process

  • Deploying to a Repository
  • Using Snapshots
  • Preparing for a Release
  • Releasing Maven Artifacts
  • Preparing for an Open Source Release
  • Publishing to Maven Central

Module 7: Maven Tricks and Patterns

  • Invoking Ant from Maven
  • Accessing Maven Artifacts from Ant
  • Building a Simple Installer
  • Running Functional Tests
  • Disabling Default Plugin Bindings
  • Excluding Transitive Dependencies

Show moredown

Prerequisites

In this Apache Maven Training, there are no formal prerequisites.

Audience

This Apache Maven Training is designed for anyone who wants to gain more knowledge about Apache Maven software. It is much more beneficial for:

  • Intermediate Java Developers
  • Project Managers

Apache Maven Training ​Course Overview

Apache Maven is most popular build automation tool which is used for java projects. It is also a most powerful project management tool based on project object model (POM). In this 2-day Apache Maven Training delegates will learn how to solve problems related to software project builds and implement the Maven repository. From this training delegates will also learn about:

  • How to manage and create projects with java
  • Understanding the Maven Repository and Lifecycle
  • How to Installing Apache Maven
  • how to set up the Maven environment
  • Understand the profile activation via properties and environment
  • Using report plugins and how to creating custom pages

Throughout this training, delegates will understand about how to install and deploy a plugin with how to generate reports on code when developers are running into problems. After completing this training, delegates will be able to create a project website and release Maven artifacts.

Show moredown

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor

Show moredown

Not sure which course to choose?

Speak to a training expert for advice if you are unsure of what course is right for you. Give us a call on +1 7204454674 or Inquire.

Package deals

Our training experts have compiled a range of course packages to compliment a variety of categories in order to help fast track your career. The packages consist of the best possible qualifications in each industry and allows you to purchase multiple courses at a discounted rate.

Swipe for more. Don’t miss out!

Big Data and Analytics Training FAQs

FAQ's

The Knowledge Academy offers Big Data and Analytics Training in a range of locations across the UK and around the world, making it easy to find a training venue near you.
This depends on the training course you choose. Please refer to each course for details.
No, Big Data and Analytics Training courses do not include exams.
Yes, all delegates are offered support throughout their training, and after the course has been completed. This is to ensure that candidates get the most out of our training courses.
The Knowledge Academy is the Leading global training provider for Big Data and Analytics Training.
The price for Big Data and Analytics Training certification in the United States starts from $.

Why we're the go to training provider for you

icon

Best price in the industry

You won't find better value in the marketplace. If you do find a lower price, we will beat it.

icon

Trusted & Approved

We are accredited by PeopleCert on behalf of AXELOS

icon

Many delivery methods

Flexible delivery methods are available depending on your learning style.

icon

High quality resources

Resources are included for a comprehensive learning experience.

barclays Logo
deloitte Logo
Thames Water Logo

"Really good course and well organised. Trainer was great with a sense of humour - his experience allowed a free flowing course, structured to help you gain as much information & relevant experience whilst helping prepare you for the exam"

Joshua Davies, Thames Water

santander logo
bmw Logo
Google Logo

Looking for more information on Big Data and Analytics Training