Hadoop

You are here

Cloudera Administrator Training for Apache Hadoop

Take your knowledge to the next level with Cloudera’s Apache Hadoop Training and Certification

OSSCube organizes administrator training course for Apache Hadoop. Cloudera University’s three-days training provides System Administrators a comprehensive understanding of all the steps necessary to operate and manage Hadoop clusters. From installation and configuration, through load balancing and tuning your cluster, Cloudera’s Administration course has you covered.

Attendees will cover topics such as:

  • Introduction to Apache Hadoop and HDFS
  • Apache Hadoop architecture
  • Proper cluster configuration and deployment
  • Populating HDFS using Apache Sqoop
  • Management and monitoring tools
  • Job scheduling
  • Best practices for maintaining Apache Hadoop in production
  • Installing and managing other Apache Hadoop projects
  • Diagnosing, tuning and solving Apache Hadoop issues

Cloudera Certified Administrator for Apache Hadoop (CCAH)

Cloudera delivers the industry's only certification for Administrator on Hadoop. Built on the content of our training courses, the program tests your knowledge of Hadoop's operation and use. The Cloudera Certified Professional designation ensures you've mastered the material necessary to operate and manage Hadoop clusters in the enterprise. Organizations rely on the Cloudera Certification during the hiring process to identify top-quality candidates.

Cloudera’s Certified Administrator for Apache Hadoop exam is thorough, and designed to test a candidate’s fluency with the concepts and terminology in the following areas:

  • Hadoop Distributed File System (HDFS)
  • MapReduce
  • Apache Hadoop Cluster Planning
  • Apache Hadoop Cluster Installation and Administration
  • Resource Management
  • Monitoring and Logging
  • Ecosystem

Format:

The course alternates between instructional sessions and hands-on exercises.

Applicability:

The course is appropriate for system administrators who will be setting up or maintaining a Hadoop cluster. Basic Linux system administration experience is a prerequisite for this training session. Prior knowledge of Hadoop is not required.

Detailed Agenda:

1. The Case for Apache Hadoop
  • A Brief History of Hadoop
  • Core Hadoop Components
  • Fundamental Concepts
2. The Hadoop Distributed File System
  • HDFS Features
  • HDFS Design Assumptions
  • Overview of HDFS Architecture
  • Writing and Reading Files
  • NameNode Considerations
  • An Overview of HDFS Security
  • Hands-On Exercise
3. MapReduce
  • What Is MapReduce?
  • Features of MapReduce
  • Basic MapReduce Concepts
  • Architectural Overview
  • MapReduce Version 2
  • Failure Recovery
  • Hands-On Exercise
4. An Overview of the Hadoop Ecosystem
  • What is the Hadoop Ecosystem?
  • Integration Tools
  • Analysis Tools
  • Data Storage and Retrieval Tools
5. Planning your Hadoop Cluster
  • General planning Considerations
  • Choosing the Right Hardware
  • Network Considerations
  • Configuring Nodes
6. Hadoop Installation
  • Deployment Types
  • Installing Hadoop
  • Using Cloudera Manager for Easy Installation
  • Basic Configuration Parameters
  • Hands-On Exercise
7. Advanced Configuration
  • Advanced Parameters
  • Configuring Rack Awareness
  • Configuring Federation
  • Configuring High Availability
  • Using Configuration Management Tools
8. Hadoop Security
  • Why Hadoop Security Is Important
  • Hadoop’s Security System Concepts
  • What Kerberos Is and How it Works
  • Configuring Kerberos Security
  • Integrating a Secure Cluster with Other Systems
9. Managing and Scheduling Jobs
  • Managing Running Jobs
  • Hands-On Exercise
  • The FIFO Scheduler
  • The FairScheduler
  • Configuring the FairScheduler
  • Hands-On Exercise
10. Cluster Maintenance
  • Checking HDFS Status
  • Hands-On Exercise
  • Copying Data Between Clusters
  • Adding and Removing
11. Cluster Nodes
  • Rebalancing the Cluster
  • Hands-On Exercise
  • NameNode Metadata Backup
  • Cluster Upgrading
  • Cluster Monitoring and Troubleshooting
  • General System Monitoring
  • Managing Hadoop’s Log Files
  • Using the NameNode and
12. JobTracker Web UIs
  • Hands-On Exercise
  • Cluster Monitoring with Ganglia
  • Common Troubleshooting Issues
  • Benchmarking Your Cluster
13. Populating HDFS From External Sources
  • An Overview of Flume
  • Hands-On Exercise
  • An Overview of Sqoop
  • Best Practices for Importing Data
14. Installing and Managing Other Hadoop Projects
  • Hive
  • Pig
  • HBase

Notes:

  • Bring your own laptop (Window, Linux, Mac)
  • Lunch, tea/coffee will be served

Click here to know the current schedule of this training.