apache-airflow-data-engineer

Mastering Apache Airflow for Data Engineers: A Comprehensive Guide to Key Features and Functionalities

You can find the link to the tutorial here.

Airflow Environment Setup

This project uses Apache Airflow to manage and schedule data pipelines. The project is containerized using Docker and orchestrated using Docker Compose.

Prerequisites

Docker
Docker Compose

Setup

Clone the repository to your local machine.
Navigate to the project directory.
Build the Docker images:

docker-compose build

Start the Airflow services:

docker-compose up

Configuration

The docker-compose.yaml file contains the configuration for the Airflow services. The following environment variables are used:

AIRFLOW__CORE__EXECUTOR: The executor to use for Airflow. In this project, we use the CeleryExecutor.
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: The connection string for the Airflow metadata database.
AIRFLOW__CELERY__RESULT_BACKEND: The connection string for the backend that Celery uses for storing results.
AIRFLOW__CELERY__BROKER_URL: The connection string for the message broker that Celery uses for sending tasks.
AIRFLOW__CORE__FERNET_KEY: The Fernet key used for encrypting passwords in the connection configuration.
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: Whether to pause DAGs when they are created.
AIRFLOW__CORE__LOAD_EXAMPLES: Whether to load the example DAGs that come with Airflow.
AIRFLOW__API__AUTH_BACKENDS: The authentication backends to use for the Airflow API.

Running the Airflow Webserver

Once the services are up and running, you can access the Airflow webserver at http://localhost:8080.

DAGs

The DAGs are defined in Python files in the dags directory.

Data

The data for the DAGs is stored in CSV files in the datasets directory.

Logs

The logs for the Airflow tasks are stored in the logs directory.

Plugins

Any Airflow plugins can be added to the plugins directory.

Stopping the Services

To stop the Airflow services, run:

docker-compose down

Additional Information

For more information on Apache Airflow, see the official documentation. For more information on Docker and Docker Compose, see the Docker documentation and the Docker Compose documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
dags		dags
databases		databases
datasets		datasets
output		output
plugins		plugins
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

apache-airflow-data-engineer

Airflow Environment Setup

Prerequisites

Setup

Configuration

Running the Airflow Webserver

DAGs

Data

Logs

Plugins

Stopping the Services

Additional Information

About

Uh oh!

Uh oh!

Languages

License

ByteMeDirk/apache-airflow-data-engineer

Folders and files

Latest commit

History

Repository files navigation

apache-airflow-data-engineer

Airflow Environment Setup

Prerequisites

Setup

Configuration

Running the Airflow Webserver

DAGs

Data

Logs

Plugins

Stopping the Services

Additional Information

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages