Big Data Hadoop and Spark Developer
Certification Training | Course
Lesson 1: Introduction to Bigdata and
Hadoop Ecosystem
In this lesson you will learn about
traditional systems, problems associated with traditional large scale systems,
what is Hadoop and its ecosystem. Topics covered are:
- Introduction
- Overview to Big Data and Hadoop
- Pop Quiz
- Hadoop Ecosystem
- Quiz
- Key Takeaways
Lesson 2: HDFS and Hadoop Architecture
In this lesson you will learn about distributed processing on cluster, HDFS architecture, how to use HDFS, YARN as a resource manager, yarn architecture and how to work with YARN. Topics covered are:
- Introduction
- HDFS Architecture and Components
- Pop Quiz
- Block Replication Architecture
- YARN Introduction
- Quiz
- Key Takeaways
- Hands- on Exercise
Lesson 3: MapReduce and Sqoop
In this lesson you will learn about Mapreduce and its characteristics, advanced MapReduce concepts, overview of Sqoop, basic import and exports in Sqoop, improving Sqoop’s performance, limitations of Sqoop and Sqoop2. Topics covered are:
- Introduction
- Why Mapreduce
- Small Data and Big Data
- Pop Quiz
- Data Types in Hadoop
- Joins in MapReduce
- What is Sqoop
- Quiz
- Key Takeaways
- Hands-on Exercise
Lesson 4: Basics of Impala and Hive
In this lesson you will be introduced to Hive and Impala, why to use Hive and Impala, differences between Hive and Impala, how Hive and Impala works and
comparison of Hive to traditional databases. Topics covered are:
- Introduction
- Pop Quiz
- Interacting with Hive and Impala
- Quiz
- Key Takeaways
Lesson 5: Working with Hive and Impala
In this lesson you will learn about metastore, how to create databases and table in Hive and Impala, loading data into tables of Hive and Impala, HCatalog
and how impala works on cluster. Topics covered are:
- Working with Hive and Impala
- Pop Quiz
- Data Types in Hive
- Validation of Data
- What is Hcatalog and Its Uses
- Quiz
- Key Takeaways
- Hands-on Exercise
Lesson 6: Type of Data Formats
In this lesson you will learn about different types of file formats which are available, Hadoop tool support for file format, avro schemas, using avro with Hive and Swoop and Avro schema evolution. Topics covered are:
- Introduction
- Types of File Format
- Pop Quiz
- Data Serialization
- Importing MySql and Creating hivetb
- Parquet WithSqoop
- Quiz
- Key Takeaways
- Hands-on Exercise
Lesson 7: Advanced HIVE concept and Data File Partitioning
In this lesson you will learn about partitioning in Hive and Impala, partitioning in Impala and Hive, when to use partition, bucketing in Hive and more advanced concepts in Hive. Topics covered are:
- Introduction
- Pop Quiz
- Overview of the Hive Query Language
- Quiz
- Key Takeaways
- Hands-on Exercise
Lesson 8: Apache Flume and HBase
In this lesson you will learn about apache flume, flume architecture, flume sources, flume sinks, flume sinks, flume channels, flume configurations,
introduction to HBase, HBase architecture, data storage in HBase, HBase vs RDBMS. Topics covered are:
- Introduction
- Pop Quiz
- Introduction to HBase
- Quiz
- Key Takeaways
- Hands-on Exercise
Lesson 9: Apache Pig
In this lesson you will learn about pig, components of Pig, Pig vs SQL and we will learn how to work with Pig. Topics covered are:
- Introduction
- Pop Quiz
- Getting Datasets for Pig Development
- Quiz
- Key Takeaways
- Hands-on Exercise
Lesson 10: Basics of Apache Spark
In this lesson you will learn about apache spark, how to use spark shell, RDDs, functional programing in Spark. Topics covered are:
- Introduction
- Architecture, Execution, and Related Concepts
- Pop Quiz
- RDD Operations
- Functional Programming in Spark
- Quiz
- Key Takeaways
- Hands-on Exercise
Lesson 11: RDDs in Spark
In this lesson you will learn RDD in detail and all operation associated with it, key value Pair RDD and few more other pair RDD operations. Topics covered
are:
- Introduction
- RDD Data Types and RDD Creation
- Pop Quiz
- Operations in RDDs
- Quiz
- Key Takeaways
- Hands-on Exercise
Lesson 12: Implementation of Spark Applications
In this lesson you will learn about spark applications vs spark shell, how to create a sparkcontext, building a spark application, how spark run on YARN in
client and cluster mode, dynamic resource allocation and configuring spark properties. Topics covered are:
- Introduction
- Running Spark on YARN
- Pop Quiz
- Running a Spark Application
- Dynamic Resource Allocation
- Configuring Your Spark Application
- Quiz
- Key Takeaways
Lesson 13: Spark Parallel Processing
In this lesson you will learn about how spark run on cluster, RDD partitions, how to create partitioning on File based RDD, HDFS and data locality, parallel
operations on spark, spark and stages and how to control the level of parallelism. Topics covered are:
- Introduction
- Pop Quiz
- Parallel Operations on Partitions
- Quiz
- Key Takeaways
- Hands-on Exercise
Lesson 14: Spark RDD Optimization Techniques
In this lesson you will learn about RDD lineage, overview on caching, distributed persistence, storage levels of RDD persistence, how to choose the correct RDD persistence storage level and RDD fault tolerance. Topics covered are:
- Introduction
- Pop Quiz
- RDD Persistence
- Quiz
- Key Takeaways
- Hands-on Exercise
Lesson 15: Spark Algorithm
In this lesson you will learn common spark use cases, interactive algorithms in spark, graph processing and analysis, machine learning and k-means algorithm. Topics covered are:
- Introduction
- Spark: An Iterative Algorithm
- Introduction To Graph Parallel System
- Pop Quiz
- Introduction To Machine Learning
- Introduction To Three C's
- Quiz
- 15.8 Key Takeaways
Lesson 16: Spark SQL
In this lesson you will learn about Spark SQL and SQL Context, creating dataframes, transforming and querying dataframes and comparing spark SQL with
Impala. Topics covered are:
- Introduction
- Pop Quiz
- Interoperating with RDDs
- Quiz
- 16.5 Key Takeaways
- Hands-on Exercise