Hadoop

You are here

Cloudera Developer Training for Apache Hadoop

Take your knowledge to the next level with Cloudera’s Apache Hadoop Training and Certification

OSSCube, the Asia’s first Cloudera training partner, organizes Cloudera Developer Training for Apache Hadoop. Cloudera University’s four-day developer training course delivers the key concepts and expertise necessary to create robust data processing applications using Apache Hadoop.

The attendees will navigate the Hadoop ecosystem, learning topics such as:

  • MapReduce and the Hadoop Distributed File System (HDFS) and how to write MapReduce code
  • Best practices and considerations for Hadoop development, debugging techniques and implementation of workflows and common algorithms
  • How to leverage Hive, Pig, Sqoop, Flume, Oozie and other projects from the Apache Hadoop ecosystem
  • Optimal hardware configurations and network considerations for building out, maintaining and monitoring your Hadoop cluster
  • Advanced Hadoop API topics required for real-world data analysis

Cloudera Certified Developer for Apache Hadoop (CCDH)

Cloudera delivers the industry's only certification for software developers on Hadoop. Built on the content of our training courses, the program tests your knowledge of Hadoop's operation and use. The Cloudera Certified Professional designation ensures you've mastered the material necessary to configure, run and use Hadoop in the enterprise. Organizations rely on the Cloudera Certification during the hiring process to identify top-quality candidates.

Cloudera’s Certified Developer for Apache Hadoop exam is thorough, and designed to test a candidate’s fluency with the concepts and terminology in the following areas:

  •     Core Hadoop Concepts
  •     Storing Files in Hadoop
  •     Job Configuration and Submission
  •     Job Execution Environment
  •     Job Lifecycle
  •     Data Processing
  •     Key and Value Types
  •     Common Algorithms and Design Patterns
  •     The Hadoop Ecosystem

Format:

This course alternates between instructional sessions and hands-on labs to ensure participants leave ready to import data into Hadoop and process that data with a variety of techniques such as Java Map Reduce programs and Hadoop Streaming jobs.

Applicability:

This session is appropriate for experienced developers who wish to write, maintain and/or optimize Apache Hadoop jobs. A background in Java is preferred, but experience with other programming languages such as PHP, Python or C# is sufficient.

Detailed Agenda:

1. The Motivation For Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach
  • Introducing Hadoop

2.  Hadoop: Basic Concepts

  • The Hadoop Project and Hadoop Components
  • The Hadoop Distributed File System
  • Hands-On Exercise: Using HDFS
  • How MapReduce Works
  • Hands-On Exercise: Running a MapReduce Job
  • How a Hadoop Cluster Operates
  • Other Hadoop Ecosystem Projects

3. Writing a MapReduce Program

  • The MapReduce Flow
  • Basic MapReduce API Concepts
  • Writing MapReduce Drivers, Mappers and Reducers in Java
  • Writing Mappers and Reducers in Other Languages Using the Streaming API
  • Speeding Up Hadoop Development by Using Eclipse
  • Hands-On Exercise: Writing a MapReduce Program
  • Differences Between the Old and New MapReduce APIs

4. Unit Testing MapReduce Programs

  • Unit Testing
  • The JUnit and MRUnit Testing Frameworks
  • Writing Unit Tests with MRUnit
  • Hands-On Exercise: Writing Unit Tests with the MRUnit Framework

5. Delving Deeper into the Hadoop API

  • Using the ToolRunner Class
  • Hands-On Exercise: Writing and Implementing a Combiner
  • Setting Up and Tearing Down Mappers and Reducers by Using the Configure and Close Methods
  • Writing Custom Partitioners for Better Load Balancing
  • Optional Hands-On Exercise: Writing a Partitioner
  • Accessing HDFS Programmatically
  • Using The Distributed Cache
  • Using the Hadoop API’s Library of Mappers, Reducers and Partitioners

6. Practical Development Tips and Techniques

  • Strategies for Debugging MapReduce Code
  • Testing MapReduce Code Locally by Using LocalJobReducer
  • Writing and Viewing Log Files
  • Retrieving Job Information with Counters
  • Determining the Optimal Number of Reducers for a Job
  • Creating Map-Only MapReduce Jobs
  • Hands-On Exercise: Using Counters and a Map-Only Job

7. Data Input and Output

  • Creating Custom Writable and WritableComparable Implementations
  • Saving Binary Data Using SequenceFile and Avro Data Files
  • Implementing Custom Input Formats and Output Formats
  • Issues to Consider When Using File Compression
  • Hands-On Exercise: Using SequenceFiles and File Compression

8. Common MapReduce Algorithms

  • Sorting and Searching Large Data Sets
  • Performing a Secondary Sort
  • Indexing Data
  • Hands-On Exercise: Creating an Inverted Index
  • Computing Term Frequency — Inverse Document Frequency
  • Calculating Word Co-Occurrence
  • Hands-On Exercise: Calculating Word Co-Occurrence (Optional)
  • Hands-On Exercise: Implementing Word Co-Occurrence with a Customer WritableComparable (Optional)

9. Joining Data Sets in MapReduce Jobs

  • Writing a Map-Side Join
  • Writing a Reduce-Side Join
  • Integrating Hadoop into the Enterprise Workflow
  • Integrating Hadoop into an Existing Enterprise
  • Loading Data from an RDBMS into HDFS by Using Sqoop
  • Hands-On Exercise: Importing Data with Sqoop
  • Managing Real-Time Data Using Flume
  • Accessing HDFS from Legacy Systems with FuseDFS and HttpFS

10. Machine Learning and Mahout

  • Introduction to Machine Learning
  • Using Mahout
  • Hands-On Exercise: Using a Mahout Recommender

11. An Introduction to Hive and Pig

  • The Motivation for Hive and Pig
  • Hive Basics
  • Hands-On Exercise: Manipulating Data with Hive
  • Pig Basics
  • Hands-On Exercise: Using Pig to Retrieve Movie Names from Our Recommender
  • Choosing Between Hive and Pig

12. An Introduction to Oozie

  • Introduction to Oozie
  • Creating Oozie Workflows
  • Hands-On Exercise: Running an Oozie Workflow

Notes:

  • Bring your own laptop (Window, Linux, Mac)
  • Lunch, tea/coffee will be served

Click here to know the current schedule of this training.