Course Methodology
This course will be highly technical with group discussions, hands-on
practical exercises, and group activities being the core focus.
Course Objectives
By the end of the course, participants will be able to:
- Understand key big data technologies, including a
deep dive into Apache Spark
- Describe the main challenges and advantages of
Hadoop map-reduce
- Demonstrate and discuss key technologies for big
data storage and compute, such as PostgreSQL and object storage
- Discuss popular machine learning algorithms, deep
learning techniques and the importance of ethics in data analytics and
artificial intelligence
- Deliver a presentation demonstrating the analytics
lifecycle and Spark
Target Audience
This is an advanced level course. It is expected that participants
either have a number of years of experience utilizing big data, or have
previously attended the Certified Big Data and Data Analytics Practitioner
(CBDDAP) course. This course is ideal for data engineers, AI engineers
and data scientists. Recommended pre-knowledge includes some python
programming experience and data visualization practice.
Target Competencies
- Big data utilization
- Big data analytics structures and technologies
- Ethics and integrity for big data and AI
development
- Big data storage
- Apache Spark best practices
Big Data Analytics Use Cases
- How can big data projects meet organizational
needs
- Big data examples:
- Netflix
- LinkedIn
- Facebook
- Google
- Orbitz
- Dell
- Others
- Best practices in project design
- Assessing the current state of your organization
- Choosing datasets for course projects
Storing Big Data
- Big data architectures and paradigms
- The Hadoop Ecosystem
- Overview of Hadoop
- Hadoop Distributed File System (HDFS)
- Massively parallel processing (MPP) versus
distributed in-memory applications
- RDBMSs vs NoSQL DBs
- PostgreSQL
- MongoDB
- Cassandra
- Streaming data
- Data-warehousing versus Data Mart
- Intro to Apache Spark
- Big data SQL hands-on-labs
Computing Big Data
- How to access big data
- Role of cloud computing
- Data movement risk
- Networking and co-location
- Apache Spark lab
- Big data extract, transform, load (ETL) big data
compute technologies
- Distributed compute
- High performance clusters vs Apache Spark
- Streaming: Storm, Spark structured streaming
- Apache Spark ETL labs
- Apache Spark data engineering
Big Data Advanced Analytics and AI
- Analytics Lifecycle
- Apache Spark vs Pandas
- Big data machine learning & deep learning in
Spark
- Importance of ethics in AI
- Automl & Hyperparameter tuning
Course Big Data Projects
- Identify analytical opportunities in an
organization
- Define and assess the problem
- Describe the impact and use of data to address the
problem
- Identify potential data sources
- Design a data analytics project
- Access, explore, analyze and visualize chosen
dataset for project
- Present project insights in course