Course Outline
The following modules are focused on acquiring knowledge about the Data Engineering on Google Cloud Platform. Delegates will learn about google cloud Dataproc and its running jobs as well as get an understanding of how to integrating Dataproc with Google Cloud platform. These modules will cover all essential concepts that required to become a Data Engineer on Google Cloud Platform.
Module 1: Google Cloud Dataproc Overview
- Dataproc Overview
- Understand Creating and Managing Clusters
- Custom Machine types and preemptible worker nodes
- Scaling and deleting Clusters
- Creating Hadoop Clusters with Google Cloud Dataproc
Module 2: Running Dataproc Jobs
- Running Pig and Hive jobs.
- Separation of storage and compute
- Understand Running Hadoop and Spark Jobs with Dataproc
- Understand Submit and Monitor jobs
Module 3: Integrating Dataproc with Google Cloud Platform
- Understand the Customise Cluster with initialisation Activities
- Understand BigQuery Support
- Understand GCP Services
Module4: Unstructured Data with Google’s Machine Learning APIs
- Machine Learning APIs
- Use Cases of ML
- Understand ML APIs
- Adding Machine Learning Capabilities to Big Data Analysis
Module 5: Serverless Data Analysis with BigQuery
- BigQuery Overview
- Learn about Functions and Queries
- Writing queries in BigQuery
- Loading data into BigQuery
- Exporting data from BigQuery
- Loading and exporting data
- Nested and repeated fields
- Querying multiple tables
- Complex queries
Module 6: Serverless, autoscaling data pipelines with Dataflow
- Beam programming model Overview
- Understand Data pipelines in Beam Python
- Understand Data pipelines in Beam Java
- Writing a Dataflow pipeline
- Scalable Big Data processing using Beam
- MapReduce in Dataflow
- Incorporating additional data
- Side inputs
- Handling stream data
- GCP Reference architecture
Module 7: Getting started with Machine Learning
- What is machine learning (ML)
- Machine Learning Types
- Explore and create ML datasets
Module 8: Building ML models with Tensorflow
- Understand TensorFlow
- TensorFlow graphs and loops
- Understand Use low-level TensorFlow
- Understand Monitoring ML training
- Charts and graphs of TensorFlow training
Module 9: Scaling ML models with CloudML
- Cloud ML Overview
- Understand TensorFlow model
- Understand Running of ML Model
Module 10: Architecture of streaming analytics pipelines
- Understand Stream data processing
- Handling variable data volumes
- Learn about unordered/late data
- Designing a streaming pipeline
Module 11: Ingesting Variable Volumes
- Cloud Pub/Sub Overview
- Working of Cloud Pub/Sub
Module 12: Implementing streaming pipelines
- Stream Processing Overview
- Handle late data
- Watermarks
- Triggers
- Accumulation
- Understand Stream data processing pipeline
Module 13: Streaming analytics and dashboards
- Streaming Analytics Overview
- Querying Streaming Data with BigQuery
- Google Data Studio Overview
- Build a real-time Dashboard to visualise processed data
Module 14: High throughput and low-latency with Bigtable
- Cloud Spanner Overview
- Designing Bigtable schema
- Ingesting into Bigtable
- Understand streaming into Bigtable