Fill in the below details and our executives will get back to you soon!

Big Data Architect is a 360° training program offered by Collabera TACT to professionals who seek to deepen their knowledge in the field of Big Data. The program is customised based on current industry standards that comprises of major sub-modules as a part of training process. This program is designed by the industry experts to provide hands-on training with tools that are used to speed up the training process.

The program is inclusive of full-fledged training, starting from Java, Hadoop developer course, MongoDB, Scala Programming, and Spark and Scala Development Training, all of which are found to be quite essential skills for the Big Data Architects. These modules put together will provide a solid foundation and give a further edge in the learning process.

This course is specifically targeted to passionate professionals who are willing to get promoted to the next level of Big Data Architects and have already gained expertise in the basic level of Java Development.

Why take 360° Master’s Program training in Big Data Architect?

Collabera TACT’s Big Data Architect Program will enable candidates to:

  • Get exposure to real-life projects that will help them create high-quality JAVA Program by developing and implementing Java Framework.
  • Conduct full-fledged Hadoop development and implementation with excellence.
  • Load the information from disparate given data sets and also be able to translate the complex functional and required technicalities into detailed design
  • Implement Hive & Pig, HBase, MapReduce Integration, and Advanced
  • Learn essential NoSQL concepts and get acquainted with the query language, indexes, and MongoDB’s scalability and high quality features.
  • Impart in-depth knowledge of Big Data processing using Hadoop and Spark environment with Spark and Scala module.

Big Data Architect Syllabus Overview

A rigorous 126 hours of training would be given to the candidates in this program, wherein they would be made to study the following five major modules and significant case studies at the end of the program.

  • Java Essentials:

Under this module candidates will be learning about OOPS concepts, core and advance JAVA, Servlets and JSP Technology.

  • Big Data Hadoop Developer Program

This program is designed to educate candidates about Linux and Big Data Virtual Machine (VM). Candidates would be taught about the Hadoop Distributed File System (HDFS), its interface, features and its application in fault tolerance. Overview of MapReduce (Theory and Practical), Hadoop Streaming (developing and debugging non Java MR programs – Ruby and Python), BSP (Bulk Synchronous Parallel) – An alternative to MapReduce and higher level abstractions for MapReduce (Pig and Hive) are other interesting topics they would be learning under this module. In the case studies, candidates will be using PIG, HBase, HIVE, and MapReduce to perform the Big Data analytics learnt in the course. First Case Study being about Twitter Analysis and the other one on Click Stream Analysis would give a complete understanding of some interesting data analysis facts and concepts.

  •  MongoDB

In this module, candidates will get an understanding of MongoDB, its installation, advantages, syntaxes and queries. They will gain an understanding of how NoSQL suits Big Data needs. Apart from this, it will also cover CRUD concepts, MongoDB security and MongoDB Administration activities. Hands-on Mongo DB project show how to work with the MongoDB Java Driver and using MongoDB as a Java Developer.

  • Scala Programming:

Like Java, Scala is an object oriented programming language and this module has been designed to impart an in-depth knowledge of programming in Scala. The module is based on 60:40 ratio; 60% practical sessions and 40% theory classes. Candidates will learn about functional programming principles, exception handling and XML manipulating in Scala.

  • Apache Spark and Scala Development Program

The objective of this program is to deliver clear understanding of Apache Spark & Scala concepts. It will provide an overview of Hadoop ecosystem, Hive, GraphX and Spark Machine Learning libraries (spark MLlib). Candidates will also learn Spark RDD and how to write and deploy Spark applications.

Each of the modules would be followed by the practical assignments that are to be completed before the commencement of next class to ensure candidates learn properly and clear all their doubts before moving ahead.

Who can take this training?

This course is designed for tech-savvy individuals who seek in-depth knowledge in the field of Big Data. Moreover, it offers promising benefits to fresher, experienced developers and architects, corporate IT professionals, engineers, and other professionals.

Pre Requisites for Big Data Architect Certification

Our industry experts would give candidates the required information to be Big Data Architect, so, there are no particular pre-requisites as such for learning this program at Collabera TACT. However, knowledge of basic programming concepts will be beneficial, but certainly not a mandate.



Core Java Contents :

  1. Features of Java
  2. Java Basics
  3. Classes and Objects
  4. Garbage Collection
  5. Java Arrays
  6. Referring Java Documentation
  7. Wrapper classes
  8. Inheritance
  9. Polymorphism
  10. Abstract Classes
  11. Interfaces
  12. Packages
  13. Introduction to Exception Handling
  14. Checked/Unchecked Exceptions
  15. Using try, catch, finally, throw, throws
  16. Exception Propagation
  17. Pre-defined Exceptions
  18. User Defined Exceptions
  19. Overview of Java IO Package
  20. Byte streams
  21. Character streams
  22. Object serialization & Object Externalization
  23. Introduction to GUI Programming (swing )
  24. Introduction to multithreading
  25. Thread life cycle
  26. Thread priorities
  27. Using wait() & notify()
  28. DeadLocks
  29. JDBC Architecture
  30. Using JDBCI API
  31. Transaction Management
Course Contents – Servlets and JSP
Java Servlet Technology
  • What Is a Servlet?
  • Servlet Life Cycle
  • Initializing a Servlet
  • Writing Service Methods
  • Getting Information from Requests
  • Constructing Responses
  • ServletContext and ServletConfig Parameters
  • Attributes- Context, Request and Session
  • Maintaining Client State – Cookies/Url rewriting/Hidden Form Fields
  • Session Management
  • Servlet Communication – include, forward, redirect
  • WEB-INF and the Deployment Descriptor
Java Server Pages Technology
  • What Is a JSP Page?
  • The Life Cycle of a JSP Page
  • Execution of a JSP page
  • Different Types of tags(directive, standard actions, bean tags, expressions, declarative)
  • Creating Static Content
  • Creating Dynamic Content
  • Using Implicit Objects within JSP Pages
  • JSP Scripting Elements
  • Including Content in a JSP Page
  • Transferring Control to Another Web Component – communication with servlet
  • Param Element
  • JavaBeans Component Design Conventions
  • Why Use a JavaBeans Component?
  • Creating and Using a JavaBeans Component
  • Setting JavaBeans Component Properties
  • Retrieving JavaBeans Component Properties
  • Custom tags


  • Introduction to Scala
    • A brief history of the Java platform to date
    • Distinguishing between the Java language and platform
    • Pain points when using Java for software development
    • Possible criteria for an improved version of Java
    • How and why the Scala language was created
  • Key Features of the Scala Language
    • Everything is an object
    • Class declarations
    • Data typing
    • Operators and methods
    • Pattern matching
    • Functions
    • Anonymous and nested functions
    • Traits
  • Basic Programming in Scala
    • Built in types, literals and operators
    • Testing for equality of state and reference
    • Conditionals, simple matching and external iteration
    • Working with lists, arrays, sets and maps
    • Throwing and catching exceptions
    • Adding annotations to your code
    • Using standard Java libraries
    • Using Scala with in java application and vice-versa
  • OO Development in Scala
    • A minimal class declaration
    • Understanding primary constructors
    • Specifying alternative constructors
    • Declaring and overriding methods
    • Creating base classes and class hierarchies
    • Creating traits and mixing them into classes
    • How a Scala inheritance tree is linearized
  • Functional Programming in Scala
    • Advanced uses of for expressions
    • Understanding function values and closures
    • Using closures to create internal iterators
    • Creating and using higher order functions
    • Practical examples of higher order functions
    • Currying and partially applied functions
    • Creating your own Domain Specific Languages(DSL’s)
  • Exception handling in Scala
  • Try catch with case
  • Pattern Matching in Depth
    • Using the match keyword to return a value
    • Using case classes for pattern matching
    • Adding pattern guards to match conditions
    • Partially specifying matches with wildcards
    • Deep matching using case constructors
    • Matching against collections of items
    • Using extractors instead of case classes
  • Test Driven Development in Scala
  • Writing standard JUnit tests in Scala
  • Conventional TDD using the ScalaTest tool
  • Behavior Driven Development using ScalaTest
  • Using functional concepts in TDD
    • XML Manipulating in Scala
      • Using Scala to read and write xml using different parsers (Dom, Sax)
      • Working with XML literals in code
      • Embedding XPath like expressions
      • Using Pattern Matching to process XML data
      • Serializing and deserializing to and from XML
      • Scala with database transactiono
    • Writing Concurrent Apps
      • Issues with conventional approaches to multi-threading
      • How an actor-based approach helps you write thread-safe code
      • The Scala architecture for creating actor-based systems
      • Different coding styles supported by the actor model
Scala web
  • Scala with JAXB
  • Scala to call/consume a REST/SOAP service
  • Scala with logging information
  • Using Scala in web application (JSP, Servlet)
  • Conclusion
• Introduction
  • Introduction
  • Module Outline
  • What We Will Build
  • History of Play!
  • Philosophy
  • Technologies
  • Summary
• Starting Up
  • Introduction
  • Downloading Play!
  • The Play Command
  • Compiling and Hot Deploy
  • Testing
  • IDE’s
  • Project Structure
  • Configuration
  • Error Handling
  • Summary
• Routing
  • Introduction
  • The Router
  • Router Mechanics
  • Routing Rules
  • Play! Routes
  • Play! Routes: HTTP Verbs
  • Play! Routes: The Path
  • Play! Routes: The Action Call
  • Routing in Action
  • Summary
• Controllers, Actions, and Results
  • Introduction
  • Controllers
  • Actions
  • Results
  • Session and Flash Scope
  • Request Object
  • Implementing the Contacts Stub Controller
  • Summary
• Views
  • Introduction
  • Play! Views
  • Static Views
  • Passing Arguments
  • Iteration
  • Conditionals
  • Partials and Layouts
  • Accessing the Session Object
  • The Asset Route
  • Summary
• Data Access
  • Introduction
  • Agnostic Data Access
  • The Domain Model
  • Evolutions
  • Finder and Listing Contacts
  • The Form Object and Adding a Contact
  • Editing a Contact
  • Deleting a Contact
  • Review
  • Summary
• The Global Object
  • Introduction
  • The Global Object
  • Global Object Methods
  • onStart
  • onHandlerNotFound
  • Summary

Introduction to Linux and Big Data Virtual Machine (VM)

Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux – Why Linux? – Windows and the Linux equivalents – Different flavors of Linux – Unity Shell (Ubuntu UI) – Basic Linux Com m ands (enough to get started with Hadoop).

Understanding Big Data
  • 3V (Volume- Variety- Velocity) characteristics
  • Structured and Unstructured Data
  • Application and use cases of Big Data Limitations of traditional large Scale systems

How a distributed way of computing is superior (cost and scale) Opportunities and challenges with Big Data

HDFS (The Hadoop Distributed File System)

HDFS Overview and Architecture

  • Deployment Architecture
  • Nam e Node, Data Node and Checkpoint Node ( aka Secondary Name Node)
  • Safe mode
  • Configuration files
  • HDFS Data Flows ( Read v/s Write)

How HDFS addresses fault tolerance?

  • CRC Check Sum
  • Data replication
  • Rack awareness and Block placement policy
  • Small files problem

HDFS Interfaces

  • Command Line Interface
  • File System
  • Administrative
  • Web Interface

Advanced HDFS features

  • Load Balancer
  • Dist Cp
  • HDFS Federation
  • HDFS High Availability
  • Hadoop Archives
Map Reduce – 1 (Theoretical Concepts)

MapReduce overview

  • Functional Programming paradigm s
  • How to think in a MapReduce way?
MapReduce Architecture
  • Legacy MR v/s Next Generation MapReduce ( aka YARN/ MRv2)
  • Slots v/s Containers
  • Schedulers
  • Shuffling, Sorting
  • Hadoop Data Types
  • Input and Output Formats
  • Input Splits – Partitioning ( Hash Partitioner v/s Customer Partitioner)
  • Configuration files
  • Distributed Cache
MR Algorithm and Data Flow
  • Word Count
Alternatives to MR – BSP (Bulk Synchronous Parallel)
  • Adhoc querying
  • Graph Computing Engines
Map Reduce – 2 (Practice)

Developing, debugging and deploying MR programs

  • Stand alone mode ( in Eclipse)
  • Pseudo distributed mode ( as in the Big Data VM)
  • Fully distributed mode ( as in Production)
MR API
  • Old and the new MR API
  • Java Client API
  • Hadoop data types and custom Writable / WritableCom parables
  • Different input and output formats
  • Saving Binary Data using Sequence Files and Avro Files
Hadoop Streaming (developing and debugging non Java MR program s – Ruby and Python)

Optimization techniques

  • Speculative execution
  • Combiners
  • JVM Reuse
  • Compression
MR algorithm s (Non- graph)
  • Sorting
  • Term Frequency
  • Inverse Document Frequency
  • Student Data Base
  • Max Temperature
  • Different ways of joining data
  • Word Co- Occurrence
MR algorithm s (Graph)
  • PageRank
  • Inverted Index
Higher Level Abstractions for MR (Pig)
  • Introduction and Architecture
  • Different Modes of executing Pig constructs Data Types
  • Dynamic invokers Pig streaming Macros
  • Pig Latin language Constructs (LOAD, STORE, DUMP, SPLI T, etc) User Defined Functions
  • Use Cases
Higher Level Abstractions for MR (Hive)
  • Introduction and Architecture
  • Different Modes of executing Hive queries Metastore Implementations
  • HiveQL (DDL & DML Operations) External v/s Managed Tables Views
  • Partitions & Buckets User Defined Functions Transformations using Non Java Use Cases
Comparison of Pig and Hive

NoSQL Databases – 1 (Theoretical Concepts)

NoSQL Concepts

  • Review of RDBMS
  • Need for NoSQL
  • Brewers CAP Theorem
  • ACI D v/s BASE
  • Schema on Read Schema on Write
  • Different levels of consistency
  • Bloom filters
Different types of NoSQL databases
  • Key Value
  • Columnar
  • Document
  • Graph

Columnar Databases concepts NoSQL Databases – 2 (Practice) HBase Architecture

  • Master and the Region Server
  • Catalog tables ( ROOT and META)
  • Major and Minor compaction
  • Configuration files
  • HBase v/s Cassandra
Interfaces to HBase (for DDL and DML operations)
  • Java API
  • Client API
  • Filters
  • Scan Caching and Batching
  • Command Line Interface
  • REST API
Advance HBase Features
  • HBase Data Modeling
  • Bulk loading data in HBase
  • HBase Coprocessors – EndPoints (similar to Stored Procedur es in RDBMS)
  • HBase Coprocessors – Observers (similar to Triggers in RDBMS)
Spark
  • Introduction to RDD
  • Installation and Configuration of Spark
  • Spark Architecture
  • Different interfaces to Spark
  • Sample Python program s in Spark
Setting up a Hadoop Cluster using Apache Hadoop
  • Cloudera Hadoop cluster on the Amazon Cloud (Practice)
  • Using EMR ( Elastic Map Reduce)

Using EC2 ( Elastic Compute Cloud)

SSH Configuration
  • Stand alone m ode (Theory) Distributed m ode (Theory)
  • Pseudo distributed
  • Fully distributed
Hadoop Ecosystem and Use Cases
  • Hadoop industry solutions
  • Importing/ exporting data across RDBMS and HDFS using Sqoop Getting real- time events into HDFS using Flume
  • Creating workflows in Oozie Introduction to Graph processing Graph processing with Neo4J
  • Processing data in real time using Storm
  • Interactive Adhoc querying with Impala
Proof of concepts and use cases
  • Click Stream Analysis using Pig and Hive Analyzing the Twitter data with Hive
  • Further ideas for data analysis

Scala Basics

  • What is Scala?
  • Why Scala for Spark?
  • Intro to Scala REPL : Journey from Java to Scala
  • Installing Scala IDE
  • Basic Operations
  • Defining Functions

Scala Essentials

  • Control Structures in Scala
  • loops – ForEach, While, Do-While
  • Collections – Array, ArrayBuffer, Map, Tuples, Lists
  • If Statements
  • Conditional Operators
  • Enumerations

OOP’s and FP

  • Class and Object Basics
  • Scala Constructors
  • Nested Classes
  • Visibility Rules
  • Overriding Methods
  • Functional Programming
  • Higher Order Functions
  • Traits
  • Interfaces
  • Layered Traits

Prerequisite: BigData and Hadoop Framework

  • Introduction to BigData
  • Challenges with Bigdata
  • Batch Realtime processing
  • Overview- Hadoop Ecosystem
  • HDFS
  • Review of MapReduce
  • Hive
  • Sqoop
  • Flume

APACHE SPARK

Introduction to Spark

  • What is Spark?
  • Spark Overview
  • Setting up environment
  • Using Spark Shell
  • Spark Web UI

Spark Basics

  • RDD’s
  • Spark Context
  • Spark Ecosystem
  • In-Memory data – Spark

Working with RDD’s

  • Creating, Loading and Saving RDD
  • Transformations in RDD
  • Actions in RDD
  • Key-Value Pair RDD
  • MapReduce and Pair RDD operations
  • RDD Partitions

Writing and Deploying Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating Spark Context
  • Building a Spark Application
  • Running a Spark Application
  • Spark and Hadoop Integration-HDFS
  • Handling Sequence Files

Spark RDD

  • RDD Lineage
  • RDD Persistence Overview
  • Distributed Persistence

Spark SQL

  • Overview on Hive
  • Spark SQL Architecture
  • SQLContext in Spark SQL
  • Working with
  • DataFrames
  • Example for Spark SQL
  • Integrating Hive and Spark SQL
  • DataFrames,Datasets and RDD’s
  • Caching dataframes
  • Knowing JSON and Parquet File Formats
  • Loading of data
  • Comparing Spark SQL,Impala and Hive-on-Spark

Spark Job Execution

  • Jobs, Stages and Tasks
  • partition and Shuffles
  • Data Locality

Spark Streaming

  • Spark Streaming Architecture
  • first Spark Streaming Programming
  • Transformations in Spark Streaming

Spark Mllib

  • What is Machine Learning?
  • ML library for Spark
  • ML Algorithms
  • ML using Pipelines and DataFrames

GraphX

  • Overview of GraphX
  • Components of GraphX
  • Hands on – PageRank, TriangleCount
  • Common Spark use-cases

Performance Tuning

  • Shared Variables : Broadcast Variables
  • Shared Variables: Accumulators
  • Common Performance Issues
  • Performance tuning tips

Course Deliverables

  • Workshop style coaching
  • Interactive approach
  • Course material
  • POC Implementation
  • Hands on practice exercises for each topic
  • Quiz at the end of each major topic
  • Tips and techniques on Cloudera Certification Examination
  • Linux concepts and basic commands

Introduction to NoSQL and MongoDB

RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, Introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples.

MongoDB Installation

Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) Installation, MongoDB Data types.

Importance of NoSQL

The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection & document, MongoDB uses, MongoDB Write Concern – Acknowledged, Replica Acknowledged, Unacknowledged, Journaled, Fsync.

CRUD Operations

Understanding CRUD and its functionality, CRUD concepts, MongoDB Query & Syntax, read and write queries and query optimization.

Data Modeling & Schema Design

Concepts of data modeling, difference between MongoDB and RDBMS modeling, Model tree structure, operational strategies, monitoring and backup.

Data Management & Administration

In this module you will learn MongoDB® Administration activities such as Health Check, Backup, Recovery, database sharding and profiling, Data Import/Export, Performance tuning etc.

Data Indexing and Aggregation

Concepts of data aggregation and types, data indexing concepts, properties and variations.

MongoDB Security

Understanding database security risks, MongoDB security concept and security approach, MongoDB integration with Java and Robomongo.

Working with Unstructured Data

Implementing techniques to work with variety of unstructured data like images, videos, log data, and others, understanding GridFS MongoDB file system for storing data.

MongoDB Project

Java is one of the most popular programming languages for working with MongoDB. This project tells you how to work with the MongoDB Java Driver, and using MongoDB as a Java Developer. Become proficient in creating a table for inserting video using Java programming. Some of the tasks and steps involved are as below–

  • Installation of Java
  • Setting up MongoDB JDBC Driver
  • Connecting to the database
  • Understanding about collections and documents
  • Reading and writing basics from the database
  • Learning about the Java Virtual Machine libraries

 

 



Our industry expert veterans are Cloudera and Hortonworks professionals with more than 12 years of experience in the field.

One would require internet with minimum 2mbps speed to benefit from the live online training.

Yes, the Collabera TACT’s virtual machine can be installed on any local system and our training team will help you with the installation.

To install Hadoop environment, one needs to have 8GB RAM, 64 bit OS, 50 GB free space on hard disk and a Virtualization Technology enabled processor within their systems.

We provide 126 hours of live online training including live POC & assignments.

The candidates need not worry about missing out on a training session. They will be able to view the recorded sessions available on the LMS. We also have a technical support team to assist the candidates in case they have any query on a missed out session.

You can access the Learning Management System (LMS) will be for lifetime, which includes – Class recordings, presentations, sample code and projects.

Yes, the certification will be provided after completing the training program. You will be evaluated on few parameters like – Attendance in sessions, Objective examination, and others. Based on you overall performance you will be certified by Collabera TACT.

 

Course Reviews

4.8

10 ratings
  • 5 stars8
  • 4 stars2
  • 3 stars0
  • 2 stars0
  • 1 stars0
  1. Profile photo of Sunny Shah

    Mathew Berning, USA

    This course certainly helped me in my job and fostered my career to the next level. I was truly happy with the syllabus and the practical sessions. I will recommend it to anyone who wants to become a master in Big Data Architect program.

  2. Profile photo of Sunny Shah

    Jimmy Philips, USA

    Overall it was a fruitful experience. Completing all the 5 modules has drastically transformed my skills for the better. This program is very well structured and helps those who want to master the skills of Big Data.

  3. Profile photo of Sunny Shah

    Michael Bowman, Canada

    I thank the entire TACT team including the trainers, technical team and the people who encouraged me to enroll for this course. This course is feasible for beginners as well as for people who expertise in Big Data Hadoop Development.

  4. Profile photo of Sunny Shah

    Eliane Eichmann, Singapore

    I am well-satisfied with the training program and the structure. I was taught from the basic level to the most advanced level in Big Data Architect program. It has transformed my job position to a senior big data architect.

  5. Profile photo of Sunny Shah

    Beverly Pouros, Singapore

    Trainers were at par with the course so they possessed deep knowledge and understanding of the field and made the session very interesting. The sessions were excellent and interactive that has enhanced my skills to meet the industry standards.

  6. Profile photo of Sunny Shah

    Laveena Kari, India

    The training was fruitful as it covered new topics meeting the industry standards. The sessions were interactive with practical industrial examples which made me understand the nitty-gritty of the course applications.

  7. Profile photo of Sunny Shah

    Rupesh Pandit, India

    The training modules that TACT posses are excellent! The trainer’s name was enough for me to enroll for Big Data Architect program training. He has excellent knowledge and trained me well in this field. Overall it was a superb experience.

  8. Profile photo of Sunny Shah

    Alex Holyman, Austalia

    The course is designed according to the industry standards. It has helped me a lot in my current profile. There was excellent dedication shown by the technical team. The required material was uploaded immediately after the training session.

  9. Profile photo of Sunny Shah

    Grace Miller, UK

    The trainer has got excellent training skills and immense knowledge on Big Data field with crisp course content. I am grateful to have pursued this training; it has definitely benefited me.

  10. Profile photo of Sunny Shah

    Nicholas Ryan, UK

    The course curriculum is dynamic and changes according to the industry standards. They train you to be job-ready. The certification has changed my work profile to Big Data Expert Analyst.

TAKE THIS COURSE
  • $899.00
  • 10 Hours
8922 STUDENTS ENROLLED

    Enroll Now

    Course Features

    We provide 126 hours of live online training including live POC & assignments.

    It would be live & interactive online session with Industry expert Instructor.

    Expert technical team available for query resolution.

    We provide lifetime Learning Management System (LMS) access which you can access from across the globe

    We strive to offer the Best Price to our customers with the guarantee of quality service levels

    Post completion of the course, you will appear for assessment from Collabera TACT. Once you get through, will be awarded with a course completion certificate.

    Drop us a query

    Collabera TACT, 25 Airport Road,Morristown, New Jersey 07960 Phone: (973)-598-3969 Email: jointact@collabera.com

    COPYRIGHT© 2017 Collabera, All Rights Reserved.
    X