TODO

General

Build workarounds for issues faced with data pulling and management due to missing persistent filesystem in CPDaaS

Data

Make more sophisticated training data (concatenate hourly to daily for all months etc)
Make extensive feature selection, feature engineering
Remote
- Test Copernicus (cdsapi) and get some data (ERA5/GloFAS) related to precipitation/river discharge/floods
- Investigate DVC and check whether or not viable candidate for OSS part of demo
- Implement data version control for originally retrieved data from Copernicus
- Implement DVC with COS as remote (S3-protocol)
  - not that COS credentials must be created with HMAC option enabled

Model

(Write notebook for model development?)
Write notebook for model training
Write notebook for model deployment
Write notebook for getting "newest data" that is supposed to be run weekly.
Write notebook for merging old data with newer data (data_until_last week + data_from_last_week)

MLOps / WS

Put together pipeline
Consider and realize pipeline scheduling
(Think about pipeline extension where model trained on data_until_last_week is benchmarked against data_with_last_week)

To think about:

Maybe track model in c0_train_model instead of c1_deploy_model to avoid - once again - storing the model to cos and then downloading it again in the next notebook before finally tracking it.

Provide feedback