Back Market technical assessment : collect-prepare challenge 🔎

Context

This challenge aims at separating the valid rows from the invalid ones in a CSV file, resulting in two separate files in Parquet format.
Use case : the business team wants only the products with an image, but wants to archive the products without image.
The full wording can be found in this GitHub repository.

I decided to handle this problem in two parts :

a first exploratory part where I was working with an interactive Jupyter notebook, which is more handy for plotting and fast experimenting
a second part focusing on the production code, where I wrote a proper Python script

Now let's start 🚀

Getting started

Firstly, clone this GitHub repository on your machine. Navigate to the cloned repository and create the project environment from the given environment.yml file. For example, using the Anaconda package manager :

conda env create -f environment.yml

Then, activate the virtual environment. For example, with conda :

conda activate back-market-case-study-lin

Warning : The conda environment was created on a Linux machine, the previous commands weren't tested on Windows nor MacOS.

Note : Once you're done with the project, don't forget to remove the environment by running :

conda env remove -n back-market-case-study-lin

Usage

Notebook

First, run the following command in your terminal to add the conda environment in the Jupyter notebook :

 python -m ipykernel install --user --name=back-market-case-study-lin

Open the notebook in your IDE and choose the kernel associated to the environment named back-market-case-study-lin.

You can now execute the notebook. You will find in it an incorporated report where I explain everything about my first approach of the problem.

CLI interface

In order to execute the transformer Python program in a terminal, navigate to the root of the GitHub repository and type this CLI command :

python transformer/transform.py ./resources/product_catalog.csv ./valid_product_catalog.parquet ./invalid_product_catalog.parquet

Note : the execution will fail if one of the output file paths already exists. To re-run the script, delete the old parquet files first.

To get more help about the parser, type :

python transformer/transform.py -h

In particular, you can specify the library with which you'd like to handle the CSV, thanks to the "--library" optional argument : the value can be either pandas or dask.

I wish you a good reading 🙂

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
resources		resources
transformer		transformer
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
experimentation_notebook.ipynb		experimentation_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Back Market technical assessment : collect-prepare challenge 🔎

Context

Getting started

Usage

Notebook

CLI interface

About

Releases

Packages

Languages

tlatrace/Back-Market-technical-assessment

Folders and files

Latest commit

History

Repository files navigation

Back Market technical assessment : collect-prepare challenge 🔎

Context

Getting started

Usage

Notebook

CLI interface

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages