Building ETL and Data Pipelines with Bash, Airflow and Kafka
About this Course
Well-designed and automated data pipelines and ETL processes are the foundation of a successful Business Intelligence platform. Defining your data workflows, pipelines and processes early in the platform design ensures the right raw data is collected, transformed and loaded into desired storage layers and available for processing and analysis as and when required. This course is designed to provide you the critical knowledge and skills needed by Data Engineers and Data Warehousing specialists to create and manage ETL, ELT, and data pipeline processes. Upon completing this course you’ll gain a solid understanding of Extract, Transform, Load (ETL), and Extract, Load, and Transform (ELT) processes; practice extracting data, transforming data, and loading transformed data into a staging area; create an ETL data pipeline using Bash shell-scripting, build a batch ETL workflow using Apache Airflow and build a streaming data pipeline using Apache Kafka. You’ll gain hands-on experience with practice labs throughout the course and work on a real-world inspired project to build data pipelines using several technologies that can be added to your portfolio and demonstrate your ability to perform as a Data Engineer. This course pre-requisites that you have prior skills to work with datasets, SQL, relational databases, and Bash shell scripts.Created by: IBM
Level: Introductory
Related Online Courses
Are you or your team starting to use Jenkins as a CI/CD tool? Are you looking to automate your software delivery process? Do you need guidelines on how to set up your CI/CD workflow using Jenkins... more
Please Note: Learners who successfully complete this IBM course can earn a skill badge — a detailed, verifiable and digital credential that profiles the knowledge and skills you’ve acquired in thi... more
For over 25 years, SOLIDWORKS has been the trusted industry standard in mechanical design and engineering. Intuitive 3D modeling and product development solutions from SOLIDWORKS help you... more
El aprendizaje automático es una habilidad que toma cada vez más relevancia debido al gran número de datos (big data), los cuales deben de ser analizados para tomar decisiones. En este curso en lí... more
The modern data analysis pipeline involves collection, preprocessing, storage, analysis, and interactive visualization of data. The goal of this course, part of the Analytics: Essential Tools and... more