Certificate in Advanced Big Data and Data Analytics (CABDDA)

  • $3,500.00



Course Methodology

This course will be highly technical with group discussions, hands-on practical exercises, and group activities being the core focus.

Course Objectives

By the end of the course, participants will be able to:

  • Understand key big data technologies, including a deep dive into Apache Spark
  • Describe the main challenges and advantages of Hadoop map-reduce
  • Demonstrate and discuss key technologies for big data storage and compute, such as PostgreSQL and object storage
  • Discuss popular machine learning algorithms, deep learning techniques and the importance of ethics in data analytics and artificial intelligence
  • Deliver a presentation demonstrating the analytics lifecycle and Spark

Target Audience

This is an advanced level course.  It is expected that participants either have a number of years of experience utilizing big data, or have previously attended the Certified Big Data and Data Analytics Practitioner (CBDDAP) course.  This course is ideal for data engineers, AI engineers and data scientists.  Recommended pre-knowledge includes some python programming experience and data visualization practice.  

Target Competencies

  • Big data utilization
  • Big data analytics structures and technologies
  • Ethics and integrity for big data and AI development
  • Big data storage
  • Apache Spark best practices

Big Data Analytics Use Cases

  • How can big data projects meet organizational needs
  • Big data examples:
  • Netflix
  • LinkedIn
  • Facebook
  • Google
  • Orbitz
  • Dell
  • Others
  • Best practices in project design
  • Assessing the current state of your organization
  • Choosing datasets for course projects

Storing Big Data

  • Big data architectures and paradigms
  • The Hadoop Ecosystem
  • Overview of Hadoop
  • Hadoop Distributed File System (HDFS)
  • Massively parallel processing (MPP) versus distributed in-memory applications
  • RDBMSs vs NoSQL DBs
  • PostgreSQL
  • MongoDB
  • Cassandra
  • Streaming data
  • Data-warehousing versus Data Mart
  • Intro to Apache Spark
  • Big data SQL hands-on-labs

Computing Big Data

  • How to access big data
  • Role of cloud computing
  • Data movement risk
  • Networking and co-location
  • Apache Spark lab
  • Big data extract, transform, load (ETL) big data compute technologies
  • Distributed compute
  • High performance clusters vs Apache Spark
  • Streaming: Storm, Spark structured streaming
  • Apache Spark ETL labs
  • Apache Spark data engineering

Big Data Advanced Analytics and AI

  • Analytics Lifecycle
  • Apache Spark vs Pandas
  • Big data machine learning & deep learning in Spark
  • Importance of ethics in AI
  • Automl & Hyperparameter tuning

Course Big Data Projects

  • Identify analytical opportunities in an organization
  • Define and assess the problem
  • Describe the impact and use of data to address the problem
  • Identify potential data sources
  • Design a data analytics project
  • Access, explore, analyze and visualize chosen dataset for project
  • Present project insights in course