close

close

Thank you for your enquiry!

One of our training experts will be in touch shortly to go over your training requirements.

close

close

Press esc to close

close close

Back to course information

Thank you for your enquiry!

One of our training experts will be in touch shortly to go overy your training requirements.

close close

Thank you for your enquiry!

One of our training experts will be in touch shortly to go over your training requirements.

Course Information

PySpark Training​ Course Outline

Module 1: Introduction to PySpark

  • What is PySpark?
  • Environment
  • Spark Dataframes
  • Reading Data
  • Writing Data
  • Transforming Data
  • MLlib
  • Pandas UDFs
  • Best Practices

Module 2: Installation

  • Using PyPI
  • Using Conda
  • Using PySpark Native Features
  • Using Virtualenv
  • Using PEX
  • Manual Downloading
  • Installing from Source
  • Dependencies

Module 3: DataFrame

  • DataFrame Creation
  • Viewing Data
  • Selecting and Accessing Data
  • Applying a Function
  • Grouping Data
  • Getting Data In/Out
  • Working with SQL

Module 4: Setting Up a Spark Virtual Environment

  • Understanding the Architecture of Data-Intensive Applications
  • Understanding Spark
  • Understanding Anaconda
  • Setting Up the Spark Powered Environment
  • Setting Up an Oracle VirtualBox with Ubuntu
  • Building First App with PySpark
  • Virtualising the Environment with Vagrant
  • Moving to the Cloud

Module 5: Building Batch and Streaming Apps with Spark

  • Architecting Data-Intensive Apps
  • Connecting to Social Networks
  • Analysing the Data
  • Exploring the GitHub World
  • Previewing App

Module 6: Learning from Data Using Spark

  • Contextualising Spark MLlib in the App Architecture
  • Classifying Spark MLlib Algorithms
  • Spark MLlib Data Types
  • Machine Learning Workflows and Data Flows
  • Clustering the Twitter Dataset
  • Building Machine Learning Pipelines

Show moredowndown

Prerequisites  

In this PySpark Training course, there are no formal prerequisites. 

Audience 

This PySpark Training provided by The Knowledge Academy is ideal for anyone who wants to learn the use of PySpark to support the collaboration of Apache Spark and Python.

PySpark Training​ Course Overview

PySpark is an interface for Apache Spark in Python and a comprehensive language for conducting exploratory data analysis at scale, for creating machine learning pipelines and building ETLs for a data platform. PySpark supports various features of Spark like Spark SQL, DataFrame, Streaming, MLlib, and Spark Core. It comes with immense benefits to its users and organisations, including simple to write, the framework handles errors, various useful algorithms, etc. This PySpark Training is curated by industry experts to help individuals in mastering skills required by utilising PySpark features in their day-to-day tasks and get opportunities to work on lucrative job posts in multinational companies.

In this 1-day PySpark Training course, delegates will learn about using the Conda environment to export their third-party Python packages by leveraging Conda-pack. They will gain in-depth knowledge about using virtualenv to manage Python dependencies in their clusters by using venv-pack. Further, delegates will learn other crucial concepts, such as reading, writing, and transforming data, MLlib, using PyPI, Conda, PySpark native features, Virtualenv, and PEX, connecting to network servers, etc. Our expert and technically sound trainer, who has years of experience in teaching technical courses, will conduct this training.

This training course will cover various essential concepts, such as:

  • Spark data frames
  • MLlib
  • Setting up a Spark virtual environment
  • Building batch and streaming apps with Spark
  • Exploring the GitHub world
  • Learning from data using Spark
  • Contextualising Spark MLlib in the app architecture

After attending this training course, delegates will be able to use conceptual frameworks for implementing the architecture of data-intensive applications in their organisations. They will also be able to harvest the data, ensuring its integrity and preparing for batch and streaming data processing by Spark.

Show moredowndown

  • Delegate pack consisting of course notes and exercises
  • Manual
  • Experienced Instructor

Show moredowndown

Why choose us

Ways to take this course

Our easy to use Virtual platform allows you to sit the course from home with a live instructor. You will follow the same schedule as the classroom course, and will be able to interact with the trainer and other delegates.

Our fully interactive online training platform is compatible across all devices and can be accessed from anywhere, at any time. All our online courses come with a standard 90 days access that can be extended upon request. Our expert trainers are constantly on hand to help you with any questions which may arise.

This is our most popular style of learning. We run courses in 1200 locations, across 200 countries in one of our hand-picked training venues, providing the all important ‘human touch’ which may be missed in other learning styles.

best_trainers

Highly experienced trainers

All our trainers are highly qualified, have 10+ years of real-world experience and will provide you with an engaging learning experience.

venues

State of the art training venues

We only use the highest standard of learning facilities to make sure your experience is as comfortable and distraction-free as possible

small_classes

Small class sizes

We limit our class sizes to promote better discussion and ensuring everyone has a personalized experience

value_for_money

Great value for money

Get more bang for your buck! If you find your chosen course cheaper elsewhere, we’ll match it!

This is the same great training as our classroom learning but carried out at your own business premises. This is the perfect option for larger scale training requirements and means less time away from the office.

tailored_learning_experience

Tailored learning experience

Our courses can be adapted to meet your individual project or business requirements regardless of scope.

budget

Maximise your training budget

Cut unnecessary costs and focus your entire budget on what really matters, the training.

team_building

Team building opportunity

This gives your team a great opportunity to come together, bond, and discuss, which you may not get in a standard classroom setting.

monitor_progress

Monitor employees progress

Keep track of your employees’ progression and performance in your own workspace.

What our customers are saying

Frequently asked questions

FAQ's

PySpark is an interface for Apache Spark in Python and a comprehensive language for conducting exploratory data analysis at scale, creating machine learning pipelines, and building ETLs for a data platform.
In this PySpark Training course, there are no formal prerequisites. 
This PySpark Training provided by The Knowledge Academy is ideal for anyone who wants to learn the use of PySpark to support the collaboration of Apache Spark and Python.
Spark Dataframe is the key data type used in the PySpark. For doing distributed computation using PySpark, Spark Dataframes are essential, and you have to perform an operation on them.
It is simple to write in parallelised code, the framework handles errors properly, algorithms, libraries, good local tools, learning curve, easy to use, etc., are the advantages of the PySpark.
In this PySpark Training course, you will learn how to use virtualenv to manage Python dependencies in their clusters using venv-pack. You will learn crucial concepts, such as reading, writing, and transforming data, MLlib, using PyPI, Conda, connecting to network servers, etc.
The price for PySpark Training certification in Belgium starts from €995
The Knowledge Academy is the Leading global training provider in the world for PySpark Training.

Why choose us

icon

Best price in the industry

You won't find better value in the marketplace. If you do find a lower price, we will beat it.

icon

Many delivery methods

Flexible delivery methods are available depending on your learning style.

icon

High quality resources

Resources are included for a comprehensive learning experience.

barclays Logo
deloitte Logo
Thames Water Logo

"Really good course and well organised. Trainer was great with a sense of humour - his experience allowed a free flowing course, structured to help you gain as much information & relevant experience whilst helping prepare you for the exam"

Joshua Davies, Thames Water

santander logo
bmw Logo
Google Logo

Looking for more information on Data Science Training?

backBack to course information

Get a custom course package

We may not have any package deals available including this course. If you enquire or give us a call on +32 80077519 and speak to our training experts, we should be able to help you with your requirements.