Skip to content

tlatrace/Back-Market-technical-assessment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Back Market technical assessment : collect-prepare challenge πŸ”Ž

Context

This challenge aims at separating the valid rows from the invalid ones in a CSV file, resulting in two separate files in Parquet format.
Use case : the business team wants only the products with an image, but wants to archive the products without image.
The full wording can be found in this GitHub repository.

I decided to handle this problem in two parts :

  • a first exploratory part where I was working with an interactive Jupyter notebook, which is more handy for plotting and fast experimenting
  • a second part focusing on the production code, where I wrote a proper Python script

Now let's start πŸš€

Getting started

Firstly, clone this GitHub repository on your machine. Navigate to the cloned repository and create the project environment from the given environment.yml file. For example, using the Anaconda package manager :

conda env create -f environment.yml

Then, activate the virtual environment. For example, with conda :

conda activate back-market-case-study-lin

Warning : The conda environment was created on a Linux machine, the previous commands weren't tested on Windows nor MacOS.


Note : Once you're done with the project, don't forget to remove the environment by running :

conda env remove -n back-market-case-study-lin

Usage

Notebook

First, run the following command in your terminal to add the conda environment in the Jupyter notebook :

 python -m ipykernel install --user --name=back-market-case-study-lin

Open the notebook in your IDE and choose the kernel associated to the environment named back-market-case-study-lin.

You can now execute the notebook. You will find in it an incorporated report where I explain everything about my first approach of the problem.


CLI interface

In order to execute the transformer Python program in a terminal, navigate to the root of the GitHub repository and type this CLI command :

python transformer/transform.py ./resources/product_catalog.csv ./valid_product_catalog.parquet ./invalid_product_catalog.parquet

Note : the execution will fail if one of the output file paths already exists. To re-run the script, delete the old parquet files first.

To get more help about the parser, type :

python transformer/transform.py -h

In particular, you can specify the library with which you'd like to handle the CSV, thanks to the "--library" optional argument : the value can be either pandas or dask.


I wish you a good reading πŸ™‚

About

Back Market collect/prepare case study πŸ”Ž

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published