It would be live & interactive online session with Industry expert Instructor.


We have round a clock expert technical team available for query resolution.


We provide lifetime Learning Management System (LMS) access which you can access from across the globe

Big Data Hadoop Developer

Big Data Hadoop has become a fairly common name in the IT industry, creating opportunities aplenty for IT professionals. By 2018, experts suggest that the U.S.A alone would require 181,000 Big Data Hadoop Professionals. Additionally, the market is estimated to grow at a CAGR of 58%, surpassing $16 billion by 2020.

Collabera TACT’s Big Data Hadoop corporate training is designed to help participants gain in-depth knowledge of managing Big Data using Apache’s open source platform – Hadoop. It also provides a thorough understanding of the core ideas and execution on wide-ranging industry use cases. The Big Data Hadoop Developer course covers advanced modules like Yarn, Zookeeper, Oozie, Flume, and Sqoop also.

Introduction/ Installation of Virtual Box and the Big Data VM Introduction to Linux, Why Linux?, Windows and Linux Equivalents, Different Flavors of Linux, Unity Shell (Ubuntu UI), Basic Linux Commands (enough to get started with Hadoop).

3V (Volume- Variety- Velocity) Characteristics, Structured and Unstructured Data, Application and Use Cases of Big Data, Limitations of Traditional Large Scale Systems, How a Distributed Way of Computing is Superior (cost and scale), Opportunities and Challenges with Big Data.

HDFS Overview and Architecture, Deployment Architecture, Name Node, Data Node and Checkpoint Node (aka Secondary Name Node), Safe Mode, Configuration Files, HDFS Data Flows (Read v/s Write).

CRC Check Sum, Data Replication, Rack Awareness and Block Placement Policy, Small Files Problem.

Command Line Interface, File System, Administrative, Web Interface.

Load Balancer, Dist CP (Distributed Copy), HDFS Federation, HDFS High Availability, Hadoop Archives.

MapReduce Overview, Functional Programming Paradigms, How to think in a MapReduce Way.

Legacy MR v/s Next Generation MapReduce, ( aka YARN/ MRv2), Slots v/s Containers, Schedulers, Shuffling, Sorting, Hadoop Data Types, Input and Output Formats, Input Splits – Partitioning ( Hash Partitioner v/s Customer Partitioner), Configuration Files, Distributed Cache.

Adhoc Querying, Graph Computing Engines.

Standalone mode ( in Eclipse), Pseudo distributed mode ( as in the Big Data VM), Fully distributed mode ( as in Production), MR API, Old and the new MR API, Java Client API, Hadoop data types and Custom Writable.

Different input and output formats, Saving Binary Data using Sequence Files and Avro Files, Hadoop Streaming (developing and debugging non-Java MR program s – Ruby and Python).

  • Speculative execution
  • Combiners
  • JVM Reuse
  • Compression

Sorting, Term Frequency, Inverse Document Frequency, Student Data Base, Max Temperature, Different ways of joining data, Word Co-Occurrence.

PageRank, Inverted Index.

Introduction and Architecture, Different Modes of executing Pig constructs, Data Types, Dynamic invokers Pig streaming Macros, Pig Latin language Constructs (LOAD, STORE, DUMP, SPLIT, etc), User Defined Functions, Use Cases.

Introduction and Architecture, Different Modes of executing Hive queries, Metastore Implementations, HiveQL (DDL & DML Operations) External v/s, Managed Tables Views, Partitions & Buckets User Defined Functions, Transformations using Non-Java Use Cases.

NoSQL Databases – 1 (Theoretical Concepts), NoSQL Concepts, Review of RDBMS
Need for NoSQL, Brewers CAP Theorem, ACID v/s BASE, Schema on Reading vs. Schema on Write, Different levels of consistency, Bloom filters.

Key Value, Columnar, Document, Graph.

HBase Architecture, Master and the Region Server, Catalog tables ( ROOT and META), Major and Minor compaction, Configuration files, HBase v/s Cassandra.

Java API, Client API, Filters, Scan Caching and Batching, Command Line Interface, REST API.

HBase Data Modeling, Bulk loading data in HBase, HBase Coprocessors – Endpoints (similar to Stored Procedures in RDBMS), HBase Coprocessors – Observers (similar to Triggers in RDBMS).

Introduction to RDD, Installation, and Configuration of Spark, Spark Architecture, Different interfaces to Spark, Sample Python program s in Spark.

Use case of YARN, YARN Architecture, YARN Demo.

Use case of Oozie, Oozie Architecture, Oozie Demo.

Use case of Flume, Flume Architecture, Flume Demo.

Use case of Sqoop, Sqoop Architecture, Sqoop Demo.

Cloudera Hadoop cluster on the Amazon Cloud (Practice), Using EMR ( Elastic Map Reduce), Using EC2 ( Elastic Compute Cloud).

Standalone mode (Theory) Distributed mode (Theory), Pseudo distributed, Fully distributed.

Hadoop industry solutions, Importing/ exporting data across RDBMS and HDFS using Sqoop Getting real-time events into HDFS using Flume, Creating workflows in Oozie Introduction to Graph processing Graph processing with Neo4J, Using the Mongo Document Database, Using the Cassandra Columnar Database, Distributed Coordination with Zookeeper.

Click Stream Analysis using Pig and Hive, Analyzing the Twitter data with Hive, Further ideas for data analysis.

Course Reviews


  • 5 stars0
  • 4 stars0
  • 3 stars0
  • 2 stars0
  • 1 stars0

No Reviews found for this course.

  • $349.00 $279.00
  • 10 Hours

    Collabera TACT, 25 Airport Road,Morristown, New Jersey 07960 Phone: (973)-598-3969 Email:

    COPYRIGHT© 2018 Collabera, All Rights Reserved.