Hadoop Developer/Admin Training – Course Content

Apache Hadoop is the open source data management software that helps organizations analyze huge volumes of structured and unstructured data, is a very hot topic across the tech industry. It can be quickly learn to take advantage of the MapReduce framework through technical sessions and hands on labs.
Training Objectives of Hadoop Developer/Admin:
Hadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo.
Target Students / Prerequisites:
Students must be belonging to IT Background and familiar with Concepts in Java and Linux
Hadoop Architecture
Introduction to
Parallel Computer vs. Distributed Computing
How to install Hadoop on your system
How to install Hadoop cluster on multiple
Hadoop Daemons introduction: NameNode, DataNode, JobTracker, TaskTracker
Exploring HDFS (Hadoop Distributed File System) Exploring the HDFS Apache Web UI
NameNode architecture (EditLog, FsImage, location of replicas) Secondary NameNode architecture
DataNode architecture
MapReduce Architecture
Exploring JobTracker/TaskTracker
How a client submits a Map-Reduce job
Exploring Mapper/Reducer/Combiner
Shuffle: Sort & Partition
Input/output formats
Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler) Exploring the Apache MapReduce Web UI
Hadoop Developer Tasks
Writting a map-reduce programme
Reading and writing data using
Java Hadoop Eclipse integration
Mapper in details
Reducer in details
Using Combiners
Reducing Intermediate Data with Combiners
Writing Partitioners for Better Load
Balancing Sorting in HDFS
Searching in HDFS
Indexing in HDFS
Hands-On Exercise
Hadoop Administrative Tasks
Routine Administrative Procedures
Understanding dfsadmin and mradmin Block Scanner, Balancer
Health Check & Safe mode
DataNode commissioning/decommissioning
Monitoring and Debugging on a production
cluster NameNode Back up and Recovery
ACL (Access control list) Upgrading Hadoop
HBase Architecture
Introduction to Hbase
HBase vs. RDBMS
Exploring HBase Master & region server
Column Families and Regions
Basic Hbase shell commands.
Hive Architecture
Introduction to Hive
HBase vs Hive
Installation of Hive
HQL (Hive query language)
Basic Hive commands
Pig Architecture
Introduction to Pig
Installation of Pig on your system
Basic Pig commands
Hands-On Exercise
Sqoop Architecture
Introduction to Sqoop
Installation of Sqoop on your system
Import/Export data from RDBMS to HDFS
Import/Export data from RDBMS to HBase
Import/Export data from RDBMS to Hive
Hands-On Exercise
Mini Project / POC ( Proof of Concept )
Facebook-Hive POC
Usages of Hadoop/Hive @ Facebook
Static & dynamic partitioning
UDF ( User defined functions )