- Docker
- Python 3.9 or later
- Python
- Airflow
- HDFS
- Spark
- Hive
- Metabase
- MySQL, Postgres
- Clone project repository
git clone <link.com>
- Navigate to project directory
cd RetailChainDatawarehouse
- Build hadoopbase docker image
make build-hadoopbase
- Start up infrastructure
make up && make setup
Come to http://localhost:8081 to access Airflow web UI and login with:
- username: airflow
- password: airflow
Go to Admin -> Connections and create a new spark connection with the following values:
Come to Dags tab and click on the trigger button on daily_pipeline
dag to run the pipeline.
Start Spark Thrift Server:
make start-thift
Come to http://localhost:4000 to access Metabase web UI and register new account.
Setup Spark Thrift Server connection to Metabase:
Access dashboad:
- Browser HDFS file: http://localhost:9870
- Spark History Server: http://localhost:18080
- Hadoop Yarn Web UI: http://localhost:8088