close

close

Thank you for your enquiry!

One of our training experts will be in touch shortly to go over your training requirements.

close

close

Press esc to close

close close

Back to course information

Thank you for your enquiry!

One of our training experts will be in touch shortly to go overy your training requirements.

close close

Thank you for your enquiry!

One of our training experts will be in touch shortly to go over your training requirements.

Big Data and Analytics Training courses

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Hadoop Big Data Certification Training Course Outline:

The Hadoop Big Data Certification is a two-day course. The course content is divided into two sections: Hadoop and Big Data. The following outlines the topics that will be covered in each section.

Hadoop

  • Understanding Big Data and Hadoop
  • Processing Distributed Data
  • Hadoop Project
  • Introduction to Data Storage and Processing
  • Defining Hadoop Cluster Requirements
  • Configuring a Cluster
  • Maximising HDFS Robustness
  • Managing Resources and Cluster Health
  • Maintaining a Cluster
  • Extending Hadoop
  • Implementing Data Ingress and Egress
  • Planning for Backup, Recovery, and Security

 

Big Data

  • Introduction to Big Data
  • Storing Big Data
  • Processing Big Data
  • Tools and Techniques to Analyse Big Data
  • Developing a Big Data Strategy
  • Implementing a Big Data Solution

Show moredown

Who should attend this Big Data Hadoop Training Course?

This course is recommended for those needing to implement or enhance their big data environment. Additionally, it is also for anyone looking to advance their analytics career by ensuring excellent foundational knowledge.

Typically, those attending are Project Managers and IT Managers, Database Administrators & Data Architects, Developers & SQL Developers, Data Scientists & Business Intelligence. This is not an exhaustive list.

Hadoop Big Data Certification Prerequisites

There are no prerequisites required to attend this training course. 

Hadoop Big Data Certification Training Course Overview

Hadoop is an open-source software platform for computing. It facilitates the processing of big data sets across computer clusters. Hadoop has no format requirements therefore is an economical solution to any organisation. Thus training, as a Certified Specialist in Hadoop, will be an asset to any organisation.

Training as a Certified Specialist in Hadoop and Big Data, you will hone the knowledge and experience required to devise a Hadoop solution that will satisfy your business requirements and needs. Post successful completion of this course, delegates shall be able to allocate, distribute, and manage resources, monitor the Hadoop file system, job progress, and overall cluster performance.

This comprehensive two-day course will equip delegates with the skills required to install, configure, and navigate the Apache Hadoop platform. In addition, delegates will be able to build a Hadoop solution that is tailored to their specific business requirements. The emergence of large data sets brings with it fresh challenges and it can be difficult to manoeuvre oneself in unchartered territory. This course extensively covers big data and shall include the storage and processing of big data, the tools and techniques used to analyse big data, how to develop a big data strategy, and implementing a big data solution. As a Certified Specialist in Big Data Analytics, you will have the expertise and skills to build competitive strategies around data-driven insights.

Show moredown

What's included in this Hadoop Training Course?

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

Hadoop Administration Training Course Outline: 

The following is a brief synopsis of the topics that will be covered over the intensive one-day course.

  • The Fundamentals of Hadoop
  • The Hadoop Ecosystem
  • Startup and Admin Commands
  • Commissioning and Decommissioning Nodes
  • Configuring a Hadoop Cluster
  • Maintaining a Cluster
  • Monitoring and Troubleshooting Clusters
  • Handling Corrupt and Missing Blocks

Show moredown

Who should attend this Hadoop Training Course?

The Hadoop Administration course is intended for IT professionals, cloud administrators, system administrators, and data engineers. However, this is not an exhaustive list.

Hadoop Administration Prerequisites

There are no formal prerequisites for this course. However, it is recommended that delegates have understood the basics of Hadoop and have knowledge of large data fields, prior to beginning this course. 

Hadoop Administration Training Course Overview

This 1-day course delivers a detailed understanding of the Hadoop open-source framework. Hadoop Administration Training will assist delegates in working with Big Data. It will further aid delegates in using the information collected to improve business objectives, quality of products, and customer satisfaction. 

This course focuses on managing, maintaining, and troubleshooting a Hadoop cluster; creating and starting admin commands, communication commands tools, and commissioning and decommissioning nodes. Furthermore, familiarise yourself with the Hadoop Ecosystem, all within an intensive one-day training course. 

Show moredown

Whats included in this Hadoop Training Course:

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

Big Data Architecture Training Course Outline

The following is a brief synopsis of the topics that will be covered over the intensive one-day course.

  • Brief overview of the Hadoop development framework
  • Real-time processing, Batch Processing
  • Data formats, Data Lifecycle
  • Data model creation
  • Database interfaces
  • Scaling
  • Security and Privacy
  • Hadoop clusters
  • Selecting the right technology
  • Big data and Hadoop administration

Show moredown

Who should attend this Big Data Architecture Training​ Course?

This course is aimed at those who wish to become data architects, data analysts, or database engineers.

Big Data Architecture Training  Prerequisites

Prior to attending this course, proficient knowledge of database management systems and technologies (MapReduce, Hive, HDFS, Spark etc.) is expected of delegates. 

Big Data Architecture Training Course Overview

Big Data Hadoop Architects are responsible for the development and deployment of applications on a large scale. In addition to this, they are tasked with preparing and creating Big Data systems. Delegates shall gain a thorough understanding of how to create a Hadoop solution that meets their business requirements. This comprehensive one-day course will equip delegates with the skills required to install, configure, and manage the Apache Hadoop platform. In addition, delegates will be able to build a Hadoop solution that is tailored to their specific business requirements.

The sudden development of large data sets brings with it fresh challenges and it can be difficult to manoeuvre oneself in unchartered territory. This course covers a wide range of big data architecture material from real-time and batch processing, data formats and data lifecycle, to the various database interfaces. Scalable applications is a critical topic covered within this module. Whether scaling up or down, being able to determine an application’s scalability, and doing so accurately and efficiently, these are essential skills a proficient big data architect will require. Another important issue, which is discussed during the course, is Security and Privacy. The principles of security within the platform alongside threats to privacy will be examined in detail. In addition to these topics, being able to select the best technology that suit the current demands, will be a key feature of the course syllabus. 

Show moredown

What's included in this Big Data Training Course

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Big Data and Hadoop Solutions Architect​ Training Course Outline

The following is a brief synopsis of the topics that will be covered over two-day course.

  • A brief overview of the Hadoop Framework
  • Understanding the role of a Big Data and Hadoop Solutions Architect
  • Learn how to process and analyse data
  • Learn how to identify different behaviours of data
  • Create Data visualisation and migrate large amounts of data
  • Hadoop Clusters: Creating, deploying, maintaining, and securing
  • NoSQL database technologies and Hadoop infrastructure

Show moredown

Who should attend this Big Data and Hadoop Solutions Architect​ Training Course

Typically, those attending are Project Managers and IT Managers, Database Administrators & Data Architects, Data Engineers, IT Systems Engineers, and Cloud Systems Administrators. This is not an exhaustive list.

Big Data and Hadoop Solutions Architect​ Prerequisites

It is highly recommended that delegates should have a comprehensive understanding of Hadoop prior to attending. 

Big Data and Hadoop Solutions Architect Training Course Overview

This Big Data and Hadoop Solutions Architect training course is a two-day intensive course aimed at those who have a comprehensive understanding of Hadoop and wish to consolidate their knowledge of solutions architecture. This course has been formulated to aid delegates in becoming Solutions Architects that are essential for businesses when they look to integrate data from various sources in a limited time frame. A Big Data Hadoop Solutions Architect is responsible for identifying specific issues whilst handling large amounts of data. They are also expected to describe the structure and behaviours of the information whilst utilising the Hadoop technology.

The Big Data and Hadoop Solutions Architect also organises how the Big Data environment ought to be developed, which includes requirement analysis, platform selection, and the design.

The course shall cover the processing and analysing of data, identifying the various behaviours of data, data visualisation and migration of data, Hadoop Clusters in detail, and the NoSQL database technologies.

A Big Data Hadoop Solutions Architect possesses a sort after skill set that is invaluable to many organisations. The demand, for Big Data Hadoop Solutions Architects, has rocketed and continues to do so within the IT industry. 

Show moredown

What's Included in this Big Data and Hadoop Solutions architect Training Course

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

Data Science Analytics Course Outline

This 1-day course will help you develop your skills to become a successful Data Analyst. By taking this course, you will be able to successfully study different types of data and turn it into a valuable source of information. You will also be able to learn various theories which include digital, technological and analytical techniques

The Following topics will be taught during this certification:

  • Introduction to Data Science
  • Understanding Data Wrangling
  • Data Analysis
  • Data Mining
  • Understanding Data Visualisation
  • Data Manipulation
  • Working with Large amounts of Data

Show moredown

Who should attend this Data Analysis Course?

The Data Science Analytics certification has been designed for anyone who is interested in analysing data and identifying any improvements or issues.

Prerequisites

There are no prerequisites for the Data Science Analytics course.

Data Science Analytics Course Overview

Data Science is a versatile area which combines scientific techniques, systems and processes to extract information from various forms of data. A Data Scientist uses the information collected to discover data courses such as revenues, testimonials and product information.

Show moredown

What's included in this Data Analysis Training Course?

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

Data Analytics with R Course Outline

Handling data is increasingly becoming essential within a business. The Data Analytics with R certification will help delegates learn the fundamentals of this programming language and use it to perform various forms of data. This 1-day course will also give delegates the skills to create data analysis tasks for yourself and enhance their skills when using “R”.

The following topics are taught during this course:

  • Overview of Data Analysis
  • Business Intelligence and Analytics
  • R programming language
  • Importing Data
  • Machine Learning

Show moredown

Who should attend this Data Analysis Training Course?

This course has been developed for those who are starting to use data analysis tools. The Data Analytics with R training is ideal for those who are interested in storing and managing data.

Prerequisites

There are no prerequisites for taking this course but it is recommended that delegates have a basic understanding when using this programming language.

Data Analytics with R Course Overview

This 1-day course has been specifically designed for those who have no knowledge of the programming language and would like to expand their skills in this area. The course will help delegates gain the skills to become successful analytics professional.

What is R?

R is a programming language which is used for statistical computing and graphics. The open source tool has been used by many statisticians, data miners and data analysts to collect data to improve their products.

 

Show moredown

What's included in this Data Analysis Training Course?

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

Big Data Analysis Course Outline

This course covers the following topics:

Understanding the Fundamentals of Big Data

  • What is Big Data?
  • Sources of Big Data

 

The Big Data Analysis Lifecycle

  • Business Case Evaluation
  • Data Identification
  • Data Acquisitions and Filtering
  • Data Extraction
  • Data Validation and Cleansing
  • Data Aggregation and Representation
  • Data Analysis
  • Data Visualisation
  • Analysis Results

 

Planning a Big Data Approach

  • Bottom-Up and Top-Down Planning
  • Technologies
  • Considering Use Cases
  • Thinking Long-Term
  • Steps for Planning

 

Implementing a Big Data Approach

  • Recognising Business Challenges
  • Finding Appropriate Data Sources
  • Involving the Business
  • Choosing What to Use

 

Storing Unstructured Information

  • Apache Hadoop
  • Microsoft HDInsight
  • Hive
  • PolyBase
  • Sqoop
  • Presto
  • Microsoft Excel
  • NoSQL

 

Managing and Analysing Unstructured Information

  • Challenges of Unstructured Data
  • Deciding on a Data Source
  • Preparing for Storage
  • Choosing Storage Solutions

Show moredown

Who should attend this Big Data Training Course?

This course has been designed for those who are interesting in managing large quantities of data and creating long-term strategies for their business.

Prerequisites

There are no prerequisites for Big Data Analysis course.

Big Data Analysis Course Overview

As more and more businesses rely on data to make their decisions, the ability to critically analyse large datasets is more important than ever. Successful Big Data Analysis can provide an insight into activities and highlight opportunities to improve and expand, as well as identify issues which may prevent growth and affect profit.

Our 1-day Big Data Analysis training course provides a comprehensive introduction to this discipline, providing knowledge of the Big Data Analysis Lifecycle and how a Big Data approach can be planned and implemented.

Show moredown

What's included in this Big Data Training Course?

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor
  • Refreshments

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Apache Spark and Scala Training​ Course outline

Introduction of Scala

  • Introduction to Scala and Deployment of Scala for Big Data applications
  • An Overview of Apache Spark analytics

Pattern Matching

  • Importance of Scala
  • The Concept of REPL (Read Evaluate Print Loop)
  • Deep Dive into Scala Pattern Matching
  • Type Interface and Higher-Order Function
  • Currying and Traits
  • Application Space
  • Scala for Data Analysis

Executing the Scala Code

  • Introduction to Scala Interpreter
  • Static Object Timer in Scala
  • Implicit Classes in Scala and Testing String Equality in Scala
  • Understand the Concept of Currying in Scala
  • Different Classes in Scala

Classes Concept in Scala

  • Introduction to Classes concept
  • Understanding the Constructor Overloading
  • Different Abstract Classes
  • The Hierarchy Types in Scala
  • The Concept of Object Equality and Val and Var Methods in Scala

Case Classes and Pattern Matching

  • Introduction to Sealed Traits
  • Wild and Constructor
  • Tuple
  • Variable and Constant pattern

Concepts of Traits with Example

  • Introduction to Traits in Scala
  • The Advantages of Traits
  • Linearisation of Traits and The Java Equivalent
  • Avoiding of Boilerplate Code

Scala Java Interoperability

  • Implementation of Traits in Scala and Java
  • Handling of Multiple Traits Extending

Scala Collections

  • Introduction to Scala Collections
  • Classification of Collections
  • The Difference Between Iterator and Iterable in Scala
  • Example of List Sequence in Scala

Mutable Collections vs Immutable Collections

  • The Types of Collections in Scala
  • Mutable and Immutable Collections
  • Lists and Arrays in Scala
  • The List Buffer and Array Buffer
  • Queue in Scala
  • Double-Ended Queue Deque
  • Stacks and Sets
  • Maps and Tuples in Scala

Use Case Bobsrockets Package

  • What is Scala Packages and Imports
  • The Selective Imports and Test Classes
  • Introduction to JUnit test Class
  • JUnit Interface via JUnit 3 suite for Scala Test
  • Packaging of Scala Applications in Directory Structure
  • Example of Spark Split and Spark Scala

Spark Course Content

Introduction to Spark

  • What are Spark and Spark Stack
  • How Spark Overcomes the Drawbacks of working Map Reduce
  • Introduction to in-memory Map Reduce
  • Interactive Operations on Map Reduce
  • Fine vs Coarse-Grained Update
  • Spark Hadoop YARN
  • HDFS and YARN Revision
  • How it is Better Hadoop
  • Deploying Spark without Hadoop
  • Spark History Server
  • Cloudera Distribution

Spark Basics

  • Spark Installation Guide and Configuration
  • Memory Management
  • Executor Memory vs Driver Memory
  • Working with Spark Shell
  • Concept of Resilient Distributed Datasets (RDD)
  • Learning to do Functional Programming in Spark
  • The Architecture of Spark

Working with RDDs in Spark

  • Spark RDD and Creating RDDs
  • RDD Partitioning
  • Operations and Transformation in RDD
  • Deep Dive into Spark RDDs
  • The RDD General Operations
  • A Read-Only Partitioned Collection of Records
  • Using the Concept of RDD for Faster and Efficient Data Processing
  • RDD Action for Collect
  • Count and Collects Map
  • Saveastextfiles
  • Pair RDD Functions

Aggregating Data with Pair RDDs

  • Introduction to Key-Value Pair in RDDs
  • How Spark makes Map-Reduce Operations Faster
  • Different Operations of RDD
  • Map Reduce Interactive Operations
  • Fine and Coarse-Grained Update

Writing and Deploying Spark Applications

  • Comparing the Spark Applications with Spark Shell
  • Creating a Spark Application using Scala or Java
  • Deploying a Spark Application
  • Scala Built Application and Creation of Mutable List
  • Set and Set Operations
  • List and Tuple
  • Concatenating List
  • Creating an Application using SBT
  • Deploying Application using Maven
  • The Web User Interface of Spark Application
  • A Real-World Example of Spark
  • Configuring of Spark

Parallel Processing

  • Spark Parallel Processing
  • Deploying on a Cluster
  • Introduction to Spark partitions
  • File-Based Partitioning of RDDs
  • What is HDFS
  • Data Locality
  • Mastering the Technique of Parallel Operations
  • Comparing Repartition & Coalesce
  • RDD Actions

Spark RDD Persistence

  • The Execution Flow in Spark
  • RDD Persistence Overview
  • Spark Execution Flow
  • Spark Terminology
  • Distribution shared Memory vs RDD
  • RDD Limitations and RDD Lineage
  • Spark Shell Arguments and Distributed Persistence
  • Key/Value Pair for Sorting Implicit Conversion like CountByKey
  • ReduceByKey and SortByKey and AggregataeByKey

Spark Streaming & Mila

  • Spark Streaming Architecture
  • Writing Streaming Program Coding
  • Processing of Spark Stream and Processing Spark Discretised Stream (DStream)
  • The Context of Spark Streaming
  • Streaming Transformation and Flume Spark Streaming
  • Request Count and Dstream
  • Multi Batch Operation and Sliding Window Operations
  • Advanced-Data Sources and Different Algorithms
  • The Concept of the Iterative Algorithm in Spark
  • Analysing with Spark Graph Processing
  • Introduction to K-Means and Machine Learning
  • Various Variables in Spark like Shared Variables
  • Broadcast Variables and Accumulators

Spark SQL and Data Frames

  • Describe Spark SQL
  • The Context of SQL in Spark
  • Working with XML Data
  • Parquet Files
  • JSON support in Spark SQL
  • Creating a Hive Context
  • Writing Data Frame to Hive
  • Reading JDBC files
  • Introduction to Data Frames in Spark
  • Creating Data Frames
  • Manual Inferring of Schema
  • Working with CSV Files
  • Reading JDBC Tables
  • Data Frame to JDBC
  • User-Defined Functions in Spark SQL
  • Shared Variable and Accumulators
  • Understanding to Query and Transform Data in Data Frames

Improving Spark Performance

  • Introduction to various variables in Spark like Shared Variables
  • Broadcast Variables
  • Learning About Accumulators
  • The Common Performance Issues
  • Troubleshooting the Performance Problems

Scheduling or Partitioning

  • Learning about the Scheduling and Partitioning in Spark
  • Hash Partition and Range Partition
  • Scheduling within and Around Applications
  • Static Partitioning and Dynamic Sharing
  • Fair Scheduling and High Order Functions
  • Map Partition with index
  • The Zip and GroupByKey
  • Spark Master High Availability
  • Standby Masters with Zookeeper
  • Single Node Recovery with Local File System

Show moredown

Prerequisites

Delegates should have basic knowledge about Java, database, query language and SQL.

Audience

This course is designed for those who want to build their career in Big Data. It is more suitable for:

  • Senior IT Professionals
  • DWProfessionals
  • Data Scientists and Analytics Professionals
  • Developers and Architects
  • Testing Professionals
  • Software Architects
  • BI and ETL Professionals
  • Engineers and Developers
  • Mainframe Professionals

Apache Spark and Scala Training​ Course Overview

Apache Spark is an open-source and lightning-fast cluster computing system which is used for analysing a large amount of data. Spark is the most extensive tool, and many large companies have used it over the world.

This 2-day Apache Spark and Scala Certification provide delegates with a piece of in-depth knowledge and practical skills to enhance competence in Big Data Spark. During this training, delegates will get an understanding of Spark and its ecosystem, Spark Streaming, Spark SQL, RDD and Scala.

This course will cover the below different concepts:

  • Scala and its programming implementation
  • Spark Applications using Python, Java and Scala
  • Apache Spark and Hadoop
  • Spark on a cluster and Spark Streaming
  • Scala Java Interoperability and other Scala operations
  • Projects using Scala to run on Spark applications
  • Scala classes concept and executes pattern matching

This course will be delivered by the industry-experienced instructor, who will provide comprehensive knowledge on Scala Programming language, YARN, HDFS, Sqoop, Flume, Spark GraphX and Messaging System such as Kafka. After completing this training, delegates will get a certificate if they passed the exam.

Show moredown

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

Apache Storm Training Course Outline

Introduction to Apache Storm

  • Apache Storm Vs. Hadoop
  • Use-Cases

Apache Storm Concepts

  • Topology
  • Tasks
  • Workers
  • Stream Grouping

Cluster Architecture

Apache Storm Workflow

Overview of Distributed Messaging System

Installing Apache Storm

  • Verifying Java Installation
  • ZooKeeper Framework Installation
  • Apache Storm Framework Installation

Apache Storm Trident

  • Topology and Tuples
  • Spout and Operations
  • State Maintenance
  • Distributed RPC

Apache Storm Applications

Show moredown

Audience

Anyone who wishes to pursue a career in Big Data Analytics or learn to use Apache Storm can attend this course. This course is well-suited for:

  • Software Professionals
  • Mainframe and Hadoop Professionals
  • Data Scientists and ETL Developers

Prerequisites

There are no prerequisites for this course; however, understanding of Java would be advantageous.

Apache Storm Training Course Overview

Apache Storm is an open-source data streaming framework. It enables the processing of a large amount of data using a fault-tolerant and horizontal scalable method. It is simple and can be used with any programming language.

This Apache Storm Training is designed to provide knowledge of how to use Apache Storm. Delegates will learn how to install Storm and create topologies, as well as how to use its workflow, cluster architecture and distributed messaging system. The course also looks at the Apache Storm Trident, including Topology and Tuples, Spout and Operations, and State Maintenance.

Show moredown

What’s Included

  • The Knowledge Academy's Apache Storm Training Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

Couchbase Training Course Outline

Introduction to Couchbase Server

Installing Couchbase Server

  • Estimate Cluster Size Requirements
  • Network Ports
  • Setting Couchbase Server

Couchbase Administration Console Basics

  • Clusters, Buckets and Servers
  • Create and Edit Data Buckets
  • Couchbase Server States

Developing with Couchbase

  • Deployment Options
  • Basic Operations
  • Storing Data
  • Client Interaction with the Cluster

Cluster Monitoring

  • Monitoring Nodes and Buckets
  • Monitoring Data Buckets
  • Monitoring Server Nodes

Managing Cluster

  • Adding Node
  • Removing Node
  • Rebalancing
  • Failover with Couchbase
  • Backup and Restore

Show moredown

Audience

Anyone who wishes to gain knowledge on Couchbase can attend this course. This course is well-suited for:

  • Software Developers
  • System Administrators
  • Database and Analytics Professionals

Prerequisites

There are no prerequisites for this course. 

Couchbase Training Course Overview

Couchbase Server is a distributed and scalable NoSQL document database, designed to allow the execution of fast create, store, update, and retrieval operations.

This Couchbase training course is designed to provide knowledge on the working of Couchbase Server, including installation and how to use the Administrative Console. The course also looks at how to develop for Couchbase and monitor and manage clusters.

Show moredown

What’s Included

  • The Knowledge Academy's Couchbase Training Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Data Analysis Training using MS Excel Course Outline

Overview of Data Analysis

  • Types of Data Analysis
  • Data Analysis Process

Introduction to Data Analysis with MS Excel

Work with Range Names

Introduction to Tables

Cleaning Data with Text Functions

Working with Date Formats and Time Formats

Conditional Formatting

Sorting and Filtering

Subtotals and Quick Analysis

Exploring Lookup Functions

Working with PivotTables

Data Visualisation and Validation

Financial Analysis

Multiple Sheets

Formula Auditing

Show moredown

Audience

This course is intended for anybody looking to learn how to use Microsoft Excel for data analysis purposes.

 

Prerequisites

There are no prerequisites for this course.

Data Analysis Training using MS Excel Course Overview

Data analysis is a key component of business intelligence (BI) and data mining. It is the process of examining and evaluating a data set by using analytical and logical reasoning. Excel provides a wide range of commands, functions, and tools to save time and make the task of data analysis easy.

This two-day Data Analysis Training using MS Excel course will provide a strong understanding of how to carry out key data analysis tasks using Excel, including preparing, processing and visualizing data. Topics covered include cleaning data, Conditional Formatting, lookup functions and formula auditing. By the end of the course you’ll be able to use Excel efficiently and effectively.

Show moredown

What’s Included

  • The Knowledge Academy’s Data Analysis Training using MS Excel Manual
  • Experienced Instructor
  • Completion Certificate

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Apache Kafka Training Course Outline

Introduction to Big Data

Overview of Kafka

  • Publish/Subscribe Messaging
  • Enter Kafka
  • The Data Ecosystem

Installing Kafka

  • Installing Java and Zookeeper
  • Installing Kafka Broker
  • Broker Configuration
  • Hardware Selection
  • Kafka Clusters

Kafka Producers

  • Creating a Kafka Producer
  • Sending Message to Kafka
  • Configuring Producers
  • Serializers
  • Partitions

Kafka Consumers

  • Create Kafka Consumer
  • Pool Loop
  • Configuring Consumers
  • Commits and Offsets
  • Rebalance Listeners
  • Deserializers

Kafka Internals

Reliable Data Delivery

  • Reliability Guarantees
  • Replication
  • Broker Configuration
  • Using Producers and Consumers in a Reliable System

Building Data Pipelines

Cross-Cluster Data Mirroring

  • Use Cases of Cross-Cluster Mirroring
  • Multicluster Architectures
  • Apache Kafka’s MirrorMaker

Administering and Monitoring Kafka

Stream Processing

  • Stream-Processing Concepts
  • Stream-Processing Design Patterns
  • Kafka Streams: Architecture Overview

Show moredown

Audience

Anyone who wishes to learn how to use Apache Kafka can attend this course. This course is ideal for:

  • Big Data Architects
  • Analytics and Research Professionals
  • Messaging and Queuing System Professionals
  • Developers who are looking to build a streaming data application

Prerequisites

There are no prerequisites for this course. However, knowledge of basic Java Programming would be beneficial.

Apache Kafka Training Course Overview

Apache Kafka is a high-performance real-time messaging system and open-source stream-processing platform that can process millions of messages per second. It is suitable for both online and offline message consumption. Apache Kafka integrates with Apache Storm and Spark for real-time streaming data analysis, minimising down time and data loss.

This Apache Kafka Training Course is designed to help delegates to acquire skills to become a Kafka Big Data Developer. During this two-day comprehensive course, delegates will learn the skills required to administer and monitor Kafka, including how to take control of a Kafka cluster by configuring Kafka Producers, Consumers and streams. Delegates will also learn how to build data pipelines and applications with Kafka, as well as how to install Java, Zookeeper and Kafka Broker.

Show moredown

What’s Included

  • The Knowledge Academy's Apache Kafka Training Course Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Apache Spark Training Course Outline

Introduction to Apache Spark

  • Cluster Design
  • Cluster Management
  • Performance

Apache Spark MLlib

  • Environment Configuration
  • Classification with Naive Bayes
  • Clustering with K-Means
  • Artificial Neural Networks (ANN)

Apache Spark Streaming

  • Errors and Recovery
  • TCP Stream
  • Apache Flume
  • Apache Kafka

Apache Spark SQL

  • SQL Context
  • Importing and Saving Data
  • DataFrames
  • Using SQL
  • User-defined Functions
  • Using Hive

Apache Spark GraphX

  • Environment
  • Creating a Graph
  • Installing Docker
  • Neo4j Browser
  • Mazerunner Algorithms

Graph-Based Storage

  • Overview of Titan and TinkerPop
  • Installing Titan
  • Titan with HBase
  • Titan with Cassandra
  • Accessing Titan with Spark

Spark Databricks

  • Installing Databricks
  • Databricks Menus
  • Account and Cluster Management
  • Notebooks and Folders
  • Jobs and Libraries
  • Databricks Tables
  • DbUtils Package

Databricks Visualisation

  • Data Visualisation
  • REST Interface
  • Moving Data

Show moredown

Audience

Anyone who wishes to enhance their knowledge of Apache Spark can attend this course. This course is ideal for:

  • Architects and Developers
  • Analytics and Research Professionals
  • BI and IT Professionals
  • Mainframe and Testing Professionals

Prerequisites

There are no prerequisites for this course. However, basic knowledge of SQL, databases, and query language will be beneficial.

Apache Spark Training Course Overview

Apache Spark is a framework for large-scale SQL, stream processing, batch processing, and machine learning. Its main feature is in-memory cluster computing, which enhances its processing speed. It can also handle both batch and real-time analytics and data processing workloads, as well as process data from different data repositories including NoSQL databases, the Hadoop Distributed File System (HDFS) and more.

This Apache Spark training course is designed to provide delegates with the skills and knowledge to become a successful Big Data and Spark Developer. The two-day course explores concepts including cluster design, cluster management and artificial neural networks, as well as how to install Docker, Titan and Databricks. It also looks at how to process graphs using GraphX.

Show moredown

What’s Included

  • The Knowledge Academy's Apache Spark Training Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Big Data Analytics & Data Science Integration Course Outline

Module 1: Big Data Analytics - Introduction

  • Big Data Overview
  • State of Practice in Analytics
  • Main Roles for New Big Data Ecosystem

Module 2: Data Analytics Lifecycle

  • Overview of Data Analytics Lifecycle
  • Phase 1 – Discovery
  • Phase 2 – Data Preparation
  • Phase 3 – Model Planning
  • Phase 4 – Model Building
  • Phase 5 – Communicate Results
  • Phase 6 - Operationalise

Module 3: Basic Data Analytic Methods Using R

  • Introduction to R
  • Exploratory Data Analysis
  • Statistical Methods for Evaluation

Module 4: Introduction to Clustering

Module 5: Association Rules

  • Apriori Algorithm
  • Evaluation of Candidate Rules
  • Applications of Association Rules
  • Validation and Testing
  • Diagnostics

Module 6: Regression

  • Linear Regression
  • Logistic Regression

Module 7: Classification

  • Decision Trees
  • Naïve Bayes
  • Diagnostics of Classifiers

Module 8: Time Series Analysis

Module 9: Text Analysis

  • Steps of Text Analysis
  • Collecting Raw Text
  • Representing Text
  • Term Frequency – Inverse Document Frequency (TFIDF)

Module 10: MapReduce and Hadoop

  • Analytics for Unstructured Data
  • The Hadoop Ecosystem
  • NoSQL

Module 11: In-Database Analytics

  • SQL Essentials
  • In-Database Text Analysis
  • Advanced SQL

Show moredown

Audience

Anybody wishing to pursue a career in Big Data and Data Science can attend this course. This course is well-suited for:

  • Business Analyst
  • Data Analysts
  • Database Professionals
  • Business Intelligence Managers
  • Graduates who wish to build a career in data science

Prerequisites

No prerequisites are required for this course.

Big Data Analytics & Data Science Integration Course Overview

Data Science is a combination of programming, analytical, and business skills that enable the review, analysis and extraction of meaningful insights from raw data. Big Data creates new opportunities for organisations to derive insights and generate competitive advantage from information.

This Big Data Analytics and Data Science Integration course will help delegates to gain expertise in using Big Data and Data Science related technologies. It provides delegates with in-depth knowledge of how to design, develop and deploy data science and big data applications in the real world. Topics covered include the Data Analytics Lifecycle, Regression, Classification, Text Analysis and Database Analytics.

Show moredown

What's Included

Anybody wishing to pursue a career in Big Data and Data Science can attend this course. This course is well-suited for:

  • Business Analyst
  • Data Analysts
  • Database Professionals
  • Business Intelligence Managers
  • Graduates who wish to build a career in data science

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Data Integration and Big Data using Talend Course Outline

Introduction to Data Integration

Introduction to Talend Big Data Solutions

Working with Projects

  • Introduction to Projects
  • Creating a Project
  • Importing a Project
  • Opening a Project
  • Deleting a Project
  • Exporting a Project

Designing a Business Model

  • Introduction to Business Model
  • Creating a Business Model
  • Modeling a Business Model
  • Editing and Saving a Business Model

Hive in Talend

Designing a Job

Managing Jobs

  • Activating/Deactivating a Subjob
  • Importing/Exporting Items and Building Jobs
  • Managing Repository Items
  • Documenting a Job
  • Handling Job Execution

Handling Jobs

Mapping Data Flows

  • Map Editor Interfaces
  • tMap Operation
  • tXMLMap Operation

Mapping Big Data Flows

  • tPigMap Interface
  • tPigMap Operation

Managing Metadata for Data Integration

Managing Metadata for Talend Big Data

  • Managing NoSQL Metadata
  • Managing Hadoop Metadata

Managing Routines

Using SQL Templates

  • Introduction to ELT
  • Overview of Talend SQL Templates
  • Managing Talend SQL Templates

Show moredown

Who should attend?

Anyone who wishes to use Talend for data integration can attend this course. This course is ideal for:

  • Data Warehousing Professionals
  • Data Scientists and Architects
  • System Administrators and Integrators
  • Business Analysts

 

Prerequisites

There are no prerequisites for this course. However, basic knowledge of Data Warehousing and SQL would be beneficial.

Data Integration and Big Data using Talend Course Overview

Talend is an open-source data integration platform. It combines data from multiple sources and ensures it can be moved quickly across to target systems to provide greater business insights. It offers various software and services for enterprise application integration, data management, data integration, cloud storage, data quality, and Big Data.

This Data Integration and Big Data using Talend course is designed to provide thorough knowledge of how to use Talend to address Big Data Integration and management challenges. The course will cover how to design and manage jobs, as well as how to create, import, open, delete and export projects. You will also learn how to manage metadata and use Talend SQL templates.

Show moredown

What’s Included

  • The Knowledge Academy's Data Integration and Big Data using Talend Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

Data Warehousing Training Course Outline

Introduction to Data Warehouse

  • What is Data Warehousing?
  • Features of Data Warehouse
  • Types of Data Warehouse
  • Components of Data Warehouse
  • Use of Data Warehouse
  • Advantages and Disadvantages
  • Data Warehouse Tools
  • Data Warehouse Applications
  • Integrating Heterogeneous Databases

Terminologies

  • Metadata
  • Metadata Repository
  • Data Cube
  • Data Mart
  • Virtual Warehouse

Dimensions and Facts

Modelling

  • ER Diagram

Delivery Process

  • Delivery Method
  • IT Strategy
  • Education and Prototyping
  • Technical Blueprint

System Processes

  • Process Flow in Data Warehouse
  • Extract and Load Process
  • Clean and Transform Process
  • Backup and Archive the Data

Data Warehouse Architecture

  • Three-Tier Data Warehouse Architecture
  • Data Warehouse Models
  • Load, Warehouse, and Query Manager

Data Warehouse OLAP

  • Types of OLAP Servers
  • OLAP Operations
  • OLAP vs OLTP

Relational and Multidimensional OLAP

Data Warehouse Schemas

  • Star Schema
  • Snowflake Schema
  • Fact Constellation Schema
  • Schema Definition

Horizontal and Vertical Partitioning

Metadata Concepts

  • Metadata Categories
  • Role of Metadata
  • Metadata Repository

Introduction to Data Marting

System and Process Managers

Security and Backup

  • Security Requirements
  • User Access
  • Impact of Security on Design
  • Hardware and Software Backup

Tuning and Testing

Show moredown

Who should attend?

Anybody wishing to gain good knowledge of data warehousing. This course is best suited for:

  • Recent Graduates
  • IT Professionals who wish to learn about data storage and data warehousing modelling
  • Finance professionals  

 

Prerequisites

There are no formal prerequisites for this course. However, an understanding of basic database concepts would be beneficial.

Data Warehousing Training Course Overview

A data warehouse is a database that collects and stores a large amount of data from a diverse range of sources to allow analysis and provide insights.

This Data Warehouse training course will introduce delegates to the fundamental concepts of data warehousing, including architecture, modelling, delivery and system processes. Delegates will learn about the different terminology related to data warehousing, metadata concepts, schemas, and security.

Show moredown

What’s Included

  • The Knowledge Academy’s Data Warehousing Training Manual
  • Experienced Instructor
  • Completion Certificate

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

ELK Stack Training Outline

Introduction to ELK Stack

  • ELK Stack Architecture
  • Importance of ELK
  • Elasticsearch
  • Logstash
  • Kibana
  • ELK vs Splunk
  • Advantages and Disadvantages of ELK Stack

Installing ELK

  • Environment Specifications
  • Java and Elasticsearch Installation
  • Logstash, Kibana and Beats Installation

 Elasticsearch

  • Basic Concepts – Documents, Types, Mapping, Shards and Index
  • Queries – Boolean Operators, Fields, Ranges and URI Search
  • REST API
  • Plugins

Logstash

  • Configuration
  • Pitfalls

Kibana

  • Kibana Searches
  • Visualisations
  • Dashboards
  • Kibana Elasticsearch Index

Beats

  • Configuration         
  • Modules

ELK in Production

  • Monitor Logstash/Elasticsearch Exceptions
  • ELK Elasticity
  • Security
  • Maintainability
  • Upgrades

Use Cases

Show moredown

Who should attend?

Anybody who wishes to learn how to use the ELK stack can attend this course. Job titles this course is recommended for:

  • System Log Analyst
  • Full Stack Technical Analyst
  • Business Analyst
  • Big Data Analytics Engineer – Elastic Search

 

Prerequisites

 A basic understanding of JSON Data Format, SQL and Restful API will be helpful.​

ELK Stack Training Overview

The ELK Stack is a combination of three open-source products - Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server-side data processing pipeline that inputs data from various sources at the same time, transforms it and sends it to a stash. Kibana enables users to visualise data with graphs and charts in Elasticsearch.

This ELK Stack Training course will provide delegates with a good understanding of Elasticsearch, Logstash and Kibana. Delegates will learn about Elasticsearch queries such as Boolean Operators, Fields, Ranges and URI Search. They will also gain knowledge on ELK elasticity and use cases. By the end of the course, you will understand how to use Elasticsearch, Logstash and Kibana, and how it can be used in business.

Show moredown

What’s Included

  • The Knowledge Academy’s ELK Stack Training Course Manual
  • Experienced Instructor
  • Completion Certificate

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Hadoop Training Course with Impala Outline

Introduction to Apache Impala

  • Benefits of Impala
  • Working of Impala with CDH

Concepts and Architecture

  • Impala Components
  • Developing Impala Applications
  • Impala in Hadoop Ecosystem

Planning Impala Deployment

Installing Impala

  • Installation with Cloudera Manager
  • Installation without Cloudera Manager

Managing and Upgrading Impala

Starting Impala

  • Starting through Cloudera Manager
  • Starting from Command Line
  • Modifying Impala Startup Options

Impala Administration

Impala Security

Impala SQL Language Reference

Using Impala Shell Command

Tuning Impala for Performance

Scalability Considerations for Impala

Partitioning for Impala Tables

Working of Impala with Hadoop File Formats

Use Impala to Query HBase Tables

Using Impala Logging

Troubleshooting Impala

Show moredown

Who should attend?

Anyone who wishes to gain expertise in Impala can attend this course. This course is ideal for:

  • Analysts and Data Scientists
  • SQL Developers
  • Hadoop Administrators and Developers
  • Database Administrators
  • Data Warehouse Developers

 

Prerequisites

No prerequisites are required for this course. However, basic knowledge of the principles of programming is advantageous.

 

Hadoop Training Course with Impala Overview

Impala is a distributed massive parallel processing SQL query engine for processing enormous data volumes stored in a Hadoop cluster. Impala is licensed by Apache, and it runs on the open-source Apache Hadoop big data analytics platform.

This Hadoop Training Course with Impala is designed to equip delegates with comprehensive knowledge regarding Apache Impala. Delegates will learn how to install, manage and upgrade Impala as well as how to start Impala through Cloudera Manager and the command line. From here, the course will show you how to administer Impala, including managing security, tuning for performance, and troubleshooting.

Show moredown

What’s Included

  • The Knowledge Academy's Hadoop Training Course with Impala Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

HBase Training Course Outline

Introduction to HBase

  • HBase and HDFS
  • Storage Mechanism
  • HBase and RDMS
  • Application of HBase
  • HBase Architecture

Installing HBase

Overview of HBase Shell

  • General Commands
  • Data Definition Language
  • Data Manipulation Language

HBase Admin API

Basics of HBase Tables

  • Creating Tables
  • Listing Tables
  • Disabling a Table
  • Enabling a Table
  • Dropping a Table

HBase Describe and Alter

HBase Exists and Shutting Down

Client API Basics

Overview of HBase Data

  • Create Data
  • Update Data
  • Read Data
  • Delete Data

HBase Scan and Security Basics

Show moredown

Who should attend?

Anyone who wishes to pursue a career in Big Data can attend this course. This course is beneficial for the following professionals:

  • Software Professionals
  • ETL Developers
  • Big Data Analysts and Testing Professionals

 

Prerequisites

There are no prerequisites for this course. However, knowledge of Hadoop architecture and APIs would be beneficial.

HBase Training Course Overview

HBase is a non-relational database providing real-time read and write access to large datasets. It allows the storing of a huge amount of data in the form of a table. It scales linearly for handling huge datasets and combining data sources with different structures and schemas. It is natively integrated with Hadoop and works with other data access engines seamlessly through YARN.

This HBase Training is designed to provide thorough knowledge of HBase, including procedures to set up HBase on Hadoop file systems. Delegates will understand the different ways to interact with HBase Shell, how to connect to HBase with the help of Java, and how basic operations are performed on HBase by using Java. You will also become familiarised with HBase tables and perform various operations on those tables.

Show moredown

What’s Included

  • The Knowledge Academy's HBase Training Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Informatica Powercenter Training Course Outline

  • Introduction to Informatica
  • Informatica Architecture
  • Installing Informatica PowerCenter
  • Configuring Clients and Repositories
    • Overview of Informatica Domain
    • Opening the Administrator Home Page
    • Creating Repository Services and Contents
    • Configuring Client and Domain
    • Creating User
  • Source Analyser and Target Designer
    • Opening a Source Analyser
    • Importing a Source Table in Source Analyser
    • Opening a Target Designer and Importing Target in Target Designer
    • Creating a Folder
  • Overview of Mappings
    • Components of Mapping
    • Create a Mapping
    • Mapping Parameters and Variables
  • Workflow and Workflow Monitor
  • Debug Mappings
  • Introduction to Transformations
    • Classification of Transformation
    • Filter Transformation
  • Source Qualifier Transformation
  • Aggregator Transformation
  • Router Transformation
  • Joiner Transformation
  • Rank Transformation
  • Sequence Generator Transformation
  • Transaction Control Transformation
  • Lookup and Re-usable Transformation
  • Normaliser Transformation
  • Performance Tuning for Transformation

Show moredown

Who should attend?

Anyone who wishes to elevate their knowledge regarding Informatica can attend this course. This course is ideal for:

  • Informatica PowerCenter Administrators
  • Software and Mainframe Developers
  • Analytics Professionals
  • Project Managers

 

Prerequisites

There are no prerequisites for this course. However, basic knowledge of SQL will be beneficial.

Informatica Powercenter Training Course Overview

Informatica PowerCenter is an enterprise ETL (extract, transform, and load) tool used to build enterprise data warehouses. It is used to extract data, transform it as per business needs and then load the data into a target data warehouse. It offers a wide range of features such as integration of data from multiple systems, operations at row level on data, or scheduling of data operations.

This comprehensive course is specifically designed to provide knowledge of Informatica PowerCenter and its architecture. As well as installing PowerCenter, it covers the configuration of clients and repositories, workflow, target designer, and debugging.

Show moredown

What’s Included

  • The Knowledge Academy's Informatica PowerCenter Training Course Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Spark Training for Python Developers Course Outline

Module 1: Set up a Spark Virtual Environment

  • Data-intensive Applications Architecture
  • Overview of Spark
  • Introduction to Anaconda
  • Setting a Spark Powered Environment
  • Building App with PySpark
  • Virtualising the Environment with Vagrant
  • Moving to the Cloud

 

Module 2: Building Batch and Streaming Apps with Spark

  • Architecting Data-intensive Apps
  • Analysing Data
  • Exploring GitHub

 

Module 3: Juggling Data with Spark

  • Serialising and Deserialising Data
  • Harvesting and Storing Data
  • Exploring Data using Blaze
  • Exploring Data using Spark SQL

 

Module 4: Data Using Spark

  • Classifying Spark MLlib Algorithms
  • Spark MLlib Data Types
  • Machine Learning Workflows and Dataflows
  • Clustering Twitter Dataset
  • Build Machine Learning Pipelines

 

Module 5: Streaming Live Data with Spark

  • Streaming Architecture
  • Process Live Data with TCP Sockets
  • Build a Reliable and Scalable Streaming App
  • Lambda and Kappa Architecture

 

Module 6: Visualising Insights and Trends

  • Preprocess Data for Visualisation
  • Setting and Creating Wordclouds
  • Geo-locating Tweets and Mapping Meetups

Show moredown

Who should attend?

This course is intended for:

  • Architects and Developers
  • Business Intelligence and Mainframe Professionals
  • Big Data Architects
  • Data Scientists and Analytics Professionals

Prerequisites

There are no formal prerequisites for this course. However, basic knowledge of SQL and Python programming would be beneficial.

Spark Training for Python Developers Course Overview

Apache Spark is an analytics engine for the processing of big data. It can carry out the processing of large-scale SQL, stream processing, batch processing, and machine learning. Spark’s main feature is its in-memory cluster computing which enhances application processing speed. It can handle both batch and real-time analytics and data processing workloads.

This Spark Training for Python Developers course is designed to provide knowledge of how to set up a virtual Spark environment. Delegates will learn how to install Spark and the Python Anaconda distribution, build batch and streaming apps using Spark, and explore data by using Blaze and Spark SQL.

Other topics covered include how to pre-process data for visualisation and how to create Wordclouds. By the completion of this course, you will be able to build a reliable and scalable streaming app.

Show moredown

What’s Included

  • The Knowledge Academy's Spark Training for Python Developers Manual
  • Experienced Instructor
  • Certificate

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

Apache ORC Training​ Course Outline

Introduction to Apache ORC

  • What is Apache ORC?
  • ORC Adapters
  • ORC Types
  • Level of Indexes
  • ACID Support

Building ORC

  • Building both C++ and Java
  • Building Java
  • Building C++

Using Apache ORC in Hive

  • Hive DDL
  • Hive Configuration
    • Table Properties
    • Configuration Properties

Using Apache ORC in MapReduce

  • Reading ORC Files
  • Writing ORC Files
  • Sending OrcStruct, OrcList, OrcMap, or OrcUnion through the Shuffle

Using ORC Core

  • Core Java
  • Core C++

Apache ORC Tools

  • C++ Tools
    • orc-contents
    • orc-metadata
    • csv-import
    • orc-scan
    • orc-statistics
  • Java Tools
    • Java Meta
    • Java Data and Scan
    • Java Convert
    • Java JSON Schema      

Show moredown

Prerequisites

There are no formal prerequisites for attending this course.

Audience

Anyone who wishes to learn about Apache ORC can attend this course.

Apache ORC Training​ Course Overview

Apache is a non-profit organisation that helps those open-source software projects that are released under the license of Apache. Apache ORC is a self-describing columnar file format enabling efficient querying and storage of data on Hadoop. It uses multi-version concurrency control for supporting ACID transactions. This Apache ORC Training is designed to equip delegates with a detailed knowledge of Apache ORC.

The Knowledge Academy’s Apache OCR Training will introduce delegates to ORC adapters and types. Delegates will gain knowledge of Apache ORC’s three levels of indexes. In addition, delegates will learn how to build Apache ORC. Delegates will get familiarised with hive DDL and configuration, including table and configuration properties.

During this 1-day course, delegates will learn how to read and write ORC files. Delegates will get an understanding of how to send OrcStruct, OrcList, OrcMap through the shuffle. This Apache ORC Training will fully prepare delegates on how to use Apache ORC tools – C++ and Java tools. Post completion of this training, delegates will be able to use        Java meta, data, scan, convert, and JSON Schema.

Show moredown

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Apache Maven Training Course Outline

Module 1: Introduction to Apache Maven

  • Installing Apache Maven
  • Understanding the Maven Repository and Lifecycle
  • Understanding the Role of Plugins

Module 2: Dependencies

  • Overview of Maven Dependencies
  • Controlling Maven classpaths
  • Maven and Transitive Dependencies
  • Managing Dependencies

Module 3: Plugins

  • What are Maven Plugins?
  • Adding Steps to a Maven Build
  • Code Generation
  • Managing Plugins with a Parent POM
  • Finding Available Plugins

Module 4: Controlling the Build

  • Maven Build Properties
  • Maven Profile
  • Profile Activation via Properties and Environment
  • User Settings, Profiles and Repositories

Module 5: The Project Website

  • The Basic Website and Reports
  • Using Report Plugins
  • Creating Custom Pages
  • Deploying to a Web Server

Module 6: The Maven Release Process

  • Deploying to a Repository
  • Using Snapshots
  • Preparing for a Release
  • Releasing Maven Artifacts
  • Preparing for an Open Source Release
  • Publishing to Maven Central

Module 7: Maven Tricks and Patterns

  • Invoking Ant from Maven
  • Accessing Maven Artifacts from Ant
  • Building a Simple Installer
  • Running Functional Tests
  • Disabling Default Plugin Bindings
  • Excluding Transitive Dependencies

Show moredown

Prerequisites

In this Apache Maven Training, there are no formal prerequisites.

Audience

This Apache Maven Training is designed for anyone who wants to gain more knowledge about Apache Maven software. It is much more beneficial for:

  • Intermediate Java Developers
  • Project Managers

Apache Maven Training ​Course Overview

Apache Maven is most popular build automation tool which is used for java projects. It is also a most powerful project management tool based on project object model (POM). In this 2-day Apache Maven Training delegates will learn how to solve problems related to software project builds and implement the Maven repository. From this training delegates will also learn about:

  • How to manage and create projects with java
  • Understanding the Maven Repository and Lifecycle
  • How to Installing Apache Maven
  • how to set up the Maven environment
  • Understand the profile activation via properties and environment
  • Using report plugins and how to creating custom pages

Throughout this training, delegates will understand about how to install and deploy a plugin with how to generate reports on code when developers are running into problems. After completing this training, delegates will be able to create a project website and release Maven artifacts.

Show moredown

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor

Show moredown

Online Instructor-led (2 days)

Classroom (2 days)

Online Self-paced (16 hours)

Apache Solr Training​ Course Outline

Module 1: Introduction to Apache Lucene

From this module, delegates will learn the basic of Lucene, its architecture, characteristics and basics of search engines. They will also learn the schema, analysers and query types of Lucene.

  • Search Engine Basics
  • Lucene Overview and Features
  • Indexing Basics and Architecture
  • Inverted Indexing Technique
  • Lucene Schema (Documents & Fields)
  • Analysers and Query Types
  • Writing and Searching Index

Module 2: Exploring Apache Lucene

From this module, delegates will explore the Scoring, Querying, Highlighting, Analysers, Boosting, Faceting etc.

  • Querying and Scoring
  • Faceting and Highlighting
  • Joins and Grouping
  • Analysers and Boosting
  • Spatial Search and Apache Tika

Module 3: What is Apache Solr?

From this module, delegates will learn the critical features of Solr, and field types of Solr and installation steps of Solr.

  • Key Features and Solr vs Databases
  • Admin UI Quick Tour
  • Solr Architecture and Schema
  • Field Types & Fields

Module 4: Overview of Solr Indexing

This module defines the Solar configurations and Indexing.

  • Introduction to Analysis and Analysers
  • Tokenisers and Filters
  • Indexing and Index Handlers
  • Indexing Options and Nested Documents
  • Transaction Logs and Commits

Module 5: Searching Using Solr

From this module, delegates will learn:

  • How to Search using Solr
  • Search Process and Velocity Search UI
  • Search Types/Options
  • Sorting and Relevance
  • Boosting and Query Syntax
  • Basic Query Parsers
  • Extended Query Parsers
  • This module will also cover Search Process, Basics and Extended Parsers, Velocity Search UI and Syntax

Module 6: Advanced Features of Solr

From this module, delegates will understand the advanced features of Solr:

Features of Solr:

  • Highlighting
  • Faceting
  • Spell Checking
  • Spatial Search
  • Collapsing and Expanding and Clustering  
  • Faceting and Highlighting
  • Spell Checking
  • Query Re-Ranking
  • Suggestions and MoreLikeThis
  • Pagination and Grouping and Clustering
  • Spatial Search
  • Collapsing & Expanding
  • Exporting Results and Real-Time Search & Get
  • Client API’s

Module 7: Administration and SolrCloud

In this module, delegates will learn about Solr Administration, Solr Cloud, Solr, Plugins and JVM Settings.

  • Managing Solrconfig.xml and solr.xml
  • Managing Multiple Cores
  • Plugins and JVM Settings
  • Running On Tomcat / Jetty
  • Logging & SSL
  • SolrCloud

Show moredown

Prerequisites

Delegates must be familiar with Java at an intermediate level, computer science, and Linux and databases.

Audience

This course is ideal for IT Professionals who wish to learn Apache Solr. It is beneficial for the following roles as well:

  • Search Analysts
  • System Administrators
  • Software Developers
  • Project Managers
  • IT Architects
  • Mainframe Professionals

Apache Solr Training​ Course Overview

Apache Solr is an open-source search engine platform which is used for enterprise search and analytics.

During this 2-day Apache Solr Certification training, delegates will gain an understanding required to use and adopt the EGSE (Enterprise Grade Search Engine). It will cover the below concepts:

  • Apache Lucene and APIs
  • Indexing and searching using Solr
  • Apache Solr and its advantages
  • Solr installation, indexing and updating schemas
  • Sol cloud cluster load balancing, and more

Throughout this certification, delegates will also explore advanced features of Solr and Solr Administration. After completing this training, delegates will be able to manage Solrconfig.xml and solr.xml.

Show moredown

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor

Show moredown

Online Instructor-led (1 days)

Classroom (1 days)

Online Self-paced (8 hours)

Splunk Power User and Admin Training​ Course Outline

Introduction to Machine Data and Splunk Basics

  • What are Machine Data and its Challenges?
  • Need for Splunk and its Features
  • Splunk Products and their Use-Case
  • Download and Install Splunk
  • Splunk Components: Search Head, Indexer, Forwarder, Deployment Server, & License Master
  • Understand the Splunk Architecture
  • Splunk Licensing Options

User Management and Splunk Configuration Files

  • Introduction to Authentication Techniques
  • User Creation and Management
  • Introduction to Indexes
  • About the Data Ageing
  • Splunk Admin Role & Responsibilities
  • Splunk Configuration Files (7)
  • Managing the. conf Files

Data Ingestion, Splunk Search, and Reporting Commands

  • Understand many Data Onboarding Techniques: -
  • Via Flat Files
  • Via UF (Universal Forwarder)
  • Basic Search Commands Implementation in Splunk: -
  • Fields, Rename, Table, Sort, and Search
  • Understand the Usage of Time Ranges though Searching
  • Understand Reporting & Transforming Commands in Splunk: -
  • Top, Rare, Stats, Chart, Timechart, Dedup and Rex

Knowledge Objects- 1

  • Splunk Knowledge
  • Categories of Splunk Knowledge
  • About Fields
  • Field Extraction
  • Event Types
  • Transactions

Knowledge Objects- 2

  • What are Lookups?
  • Defining a Lookup
  • Configuring an Automatic Lookup
  • Using the Lookup in Searches and Reports
  • About Tags
  • Workflow Action
  • Overview of Data Model
  • Understand about Creating and Managing Tags
  • Defining and Searching Field Aliases

Splunk Alerts, Visualisations, Resorts, and Dashboards

  • Create Alerts Triggered on Certain Conditions
  • Different Splunk Visualisations
  • Create Reports with Search Results
  • Create Dashboards with different Charts and Other Visualisations
  • Set Permissions for Reports and Dashboard
  • Create Reports and Schedule them using Cron Schedule
  • Share Dashboard with Other Teams

Splunk Clustering Techniques

  • Install Splunk on Linux OS
  • Use the Frequently used Splunk CLI Commands
  • Learn the Best Practices while Setting up a Clustering Environment
  • Introduction to Splunk Clustering
  • Implement Search Head Clustering
  • Implement Indexer Clustering
  • Deploy an App on the Search Head cluster

Show moredown

Prerequisites

There are no formal prerequisites to attend.

Audience

This course is designed for:

  • System Administrators
  • Software Developers
  • Analytics Managers
  • Individual Contributors/Architects willing to implement Splunk in their organisations

Splunk Power User and Admin Training​ Course Overview

Splunk Power User and Admin Certification provide in-depth knowledge about the latest concepts which are required for both Splunk Administrators and Splunk Power Users. This training teaches delegates how to work with configuration and user management in Splunk as well as introduces them with the machine data and understand the challenges it presents.

In this 1-day Splunk Power User and Admin Certification, delegates will learn how to create and manage users, and they will understand the architecture of Splunk index and be able to work with Splunk configuration files. It provides complete knowledge of the schedule alerts, creates reports and dashboards along with different visualisations.

 During this training, delegates will learn various topics:

  • User creation and management
  • Field extraction
  • Install Splunk on Linux OS
  • Create reports with search results

After completing this training, delegates will be able to set up a cluster of Splunk instances and able to understand the roles and responsibilities of the Splunk Admin.

Show moredown

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor

Show moredown

Not sure which course to choose?

Speak to a training expert for advice if you are unsure of what course is right for you. Give us a call on 01344 203999 or Enquire.

Package deals

Our training experts have compiled a range of course packages to compliment a variety of categories in order to help fast track your career. The packages consist of the best possible qualifications in each industry and allows you to purchase multiple courses at a discounted rate.

Swipe for more. Don’t miss out!

What our customers are saying

Frequently asked questions

FAQ's

The Knowledge Academy offers Big Data and Analytics Training in a range of locations across the UK and around the world, making it easy to find a training venue near you.
This depends on the training course you choose. Please refer to each course for details.
No, Big Data and Analytics Training courses do not include exams.
Yes, all delegates are offered support throughout their training, and after the course has been completed. This is to ensure that candidates get the most out of our training courses.
The Knowledge Academy is the Leading global training provider in the world for Big Data and Analytics Training.
The price for Big Data and Analytics Training certification in the United Kingdom starts from £895.

Why we're the go to training provider for you

icon

Best price in the industry

You won't find better value in the marketplace. If you do find a lower price, we will beat it.

icon

Trusted & Approved

We are accredited by PeopleCert on behalf of AXELOS

icon

Many delivery methods

Flexible delivery methods are available depending on your learning style.

icon

High quality resources

Resources are included for a comprehensive learning experience.

barclays Logo
deloitte Logo
Thames Water Logo

"Really good course and well organised. Trainer was great with a sense of humour - his experience allowed a free flowing course, structured to help you gain as much information & relevant experience whilst helping prepare you for the exam"

Joshua Davies, Thames Water

santander logo
bmw Logo
Google Logo
Shell Logo

"...the trainer for this course was excellent. I would definitely recommend (and already have) this course to others."

Diane Gray, Shell

Looking for more information on Big Data and Analytics Training