Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 1.47 KB

TODO.md

File metadata and controls

34 lines (25 loc) · 1.47 KB

TODO

General

  • Build workarounds for issues faced with data pulling and management due to missing persistent filesystem in CPDaaS

Data

  • Make more sophisticated training data (concatenate hourly to daily for all months etc)
  • Make extensive feature selection, feature engineering
  • Remote
    • Test Copernicus (cdsapi) and get some data (ERA5/GloFAS) related to precipitation/river discharge/floods
    • Investigate DVC and check whether or not viable candidate for OSS part of demo
    • Implement data version control for originally retrieved data from Copernicus
    • Implement DVC with COS as remote (S3-protocol)
      • not that COS credentials must be created with HMAC option enabled

Model

  • (Write notebook for model development?)
  • Write notebook for model training
  • Write notebook for model deployment
  • Write notebook for getting "newest data" that is supposed to be run weekly.
  • Write notebook for merging old data with newer data (data_until_last week + data_from_last_week)

MLOps / WS

  • Put together pipeline
  • Consider and realize pipeline scheduling
  • (Think about pipeline extension where model trained on data_until_last_week is benchmarked against data_with_last_week)

To think about:

  • Maybe track model in c0_train_model instead of c1_deploy_model to avoid - once again - storing the model to cos and then downloading it again in the next notebook before finally tracking it.