Skip to content

2024-12-05 - Green Software Playbooks agenda #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 6 tasks
bryaki02 opened this issue Dec 4, 2024 · 6 comments
Open
3 of 6 tasks

2024-12-05 - Green Software Playbooks agenda #19

bryaki02 opened this issue Dec 4, 2024 · 6 comments
Labels

Comments

@bryaki02
Copy link
Contributor

bryaki02 commented Dec 4, 2024

Date

2024-12-05 - 15:00 UTC - See the time in your timezone https://everytimezone.com

Roll Call

Please add a comment to this issue during the meeting to denote attendance.
Any untracked attendees will be added by the GSF team below:

  • Full Name, Affiliation, (optional) GitHub username

Previous Meeting

Notes from the previous meeting:

  • Discussed the process for capturing the playbook instructions for the Data Engineer Directory

Agenda

  • Convene & Roll Call
  • Review submissions since last meeting
  • Plan for new year to attract new contributors
  • Review the agenda and suggest new agenda points
  • [Agenda Item]
  • AOB, Q&A & Adjourn

Any Other Business

@moin-oss
Copy link
Contributor

moin-oss commented Dec 5, 2024

I'm going to be a few minutes late to the meeting.

@f-mellinghoff
Copy link

Create a suggestion for : "Move archived data to appropriate storage (maybe cold storage is enough for some archived data)"

@moin-oss
Copy link
Contributor

moin-oss commented Dec 5, 2024

Will add a writeup on "minimizing the frequency of batch jobs" and determine how much this overlaps with "only load data where changes occurred, maybe think about event based triggers (only load delta, but also only start followup ETL processes, if some source data changed)"

@f-mellinghoff
Copy link

f-mellinghoff commented Dec 19, 2024

Green Software Playbooks – Data Engineering

Improvements to existing projects

Move data to appropriate storage (hot vs. cold storage)

Analyse your existing data and move all rarely accessed data which still needs to be stored (e.g. for compliance or legal reasons) to cold storage.
By default, most data is stored in hot storage which is typically meant for frequently used data where fast and reliable access is needed. But often projects also include data which must only be archived and will not be accessed regularly any longer. For this data cold storage with a slower access to the data is sufficient and it should be moved accordingly.
Establish a continuous process to judge if data can be moved to cold storage or needs to be kept in hot storage.

Green IT Advantages: By moving data to cold storage not only will the storage costs be reduced (though access costs increase), but it will also save energy as the servers on which the data is stored do not need to constantly be available.

Considerations during setup of a new project

Move data to appropriate storage (hot vs. cold storage)

Define a data strategy from the beginning of your project which defines which data storage option (hot vs. cold) should be used for which data.
Data stored in hot storage typically needs to be accessed frequently whereas cold storage data is not accessed regularly.
Establish a continuous process to judge if data can be moved to cold storage or needs to be kept in hot storage.

Green IT Advantages: By moving data to cold storage not only will the storage costs be reduced (though access costs increase), but it will also save energy as the servers on which the data is stored do not need to constantly be available.

@moin-oss
Copy link
Contributor

moin-oss commented Feb 5, 2025

Green Software Playbooks – Data Engineering

Optimize the frequency of batch jobs

Batch processing is useful at efficiently handling large volumes of data at once to optimize resource use. If possible these can be run at times when the electricity is cleanest. Additionally, by minimizing the number of batch jobs that need to be run you can reduce the overall amount of resources than need to be spent running these jobs.

Improvements to existing projects

Start by examining how often these jobs are run and what times. Based on when your clients need the data see if it's possible to run these jobs when the energy is the cleanest. Depending on what these batch jobs are collecting, see if it's possible to reduce the number of times these jobs need to run. In particular, determine if these jobs can be set to run only when there are changes to pick up. Generally looking for ways to optimizing the batch processing will help.

Considerations during setup of a new project

When setting up a new project make sure you have a clear understanding of your stakeholder's needs in receiving the data from the batch process. You can then determine how much flexibility you have in setting when these jobs are run and how often they need to be run. Running the jobs when the energy is cleanest, and minimizing their frequency, only running them when there are changes to pick up, can reduce electricity consumption.

Green IT Advantages: Optimizing your batch jobs has a number of advantages. At the very least you can minimize the amount of electricity and hardware needed for the jobs. You can also ensure that the greenhouse gas emissions associated with the job are minimized by running these jobs when the electricity mix in the grid is cleanest.

@moin-oss
Copy link
Contributor

moin-oss commented Feb 5, 2025

I've added the entry above for "minimizing the frequency of batch jobs", although I've modified it slightly and rolled in "only load data where changes occurred, maybe think about event based triggers (only load delta, but also only start followup ETL processes, if some source data changed)". Let me know if this captures the topic well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

3 participants