Skip to content

Latest commit

 

History

History
95 lines (68 loc) · 3.52 KB

README.md

File metadata and controls

95 lines (68 loc) · 3.52 KB

ETL Data Pipeline

Table of contents

  1. About the project
  2. Tech used
  3. Getting started
  4. Testing
  5. ETL Data Pipiline Whiteboard
  6. Appendix

About the project

This project succeeds a CLI app, built for a small and independent cafe to track their stock, couriers and customers. Due to said cafe's unprecedented growth, they have expanded to hundreds of outlets across the country. With this demand, comes the need to utilise their sales data to best target new and returning customers, and also to understand which products are selling well. The cafes are experiecing issues with collating and analysing the data produced at each branch, as their technical setup is limited.

This project solves the problem of providing consultation in what is needed to grow technical offerings, so that the company can continue to accelerate growth.

After a thorough anaylisys of the company's needs, it was decided that the best course of action was to create an ETL pipeline to handle the large volumes of transactional data from the business. The data needs to be centrally stored in a cloud environment so all the stakeholders could quickly access it.

By being able to easily query the company's data as a whole, the client will drastically increase their ability to identify company-wide trends and insights.

Tech used

Getting started

Prerequisites

  • Any IDE tool for Python code development,
  • AWS Account,
  • GitHub Account with Repository.

Requirements

boto3==1.24.13
pandas==1.4.2
psycopg2==2.9.3

Installation

  1. Clone the repo
git clone https://github.com/YuliaTom/Cafe-network-ETL-pipeline.git
  1. Create a virtual environment called env

Testing

Unit testing to be implemented with Pytest

ETL Data Pipiline Whiteboard

The schematic representation of the pipeline can be found by the link below

Appendix

Lambda Function


DB Schema


Lambda function testing