Skip to content

2024-11-07 - Green Software Playbooks agenda #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks
bryaki02 opened this issue Nov 7, 2024 · 5 comments
Open
5 tasks

2024-11-07 - Green Software Playbooks agenda #18

bryaki02 opened this issue Nov 7, 2024 · 5 comments
Labels

Comments

@bryaki02
Copy link
Contributor

bryaki02 commented Nov 7, 2024

Date

2024-11-07 - HH:MM UTC - See the time in your timezone https://everytimezone.com

Roll Call

Please add a comment to this issue during the meeting to denote attendance.
Any untracked attendees will be added by the GSF team below:

  • Full Name, Affiliation, (optional) GitHub username

Previous Meeting

Notes from the previous meeting: ...

Agenda

OKR & KPI updates

Any Other Business

  • Co-Chair seats open
@moin-oss
Copy link
Contributor

moin-oss commented Nov 7, 2024

We want to differentiate between improvements to existing projects and creating a new project or adding features.

@f-mellinghoff
Copy link

Write some first draft for the following best practice idea: Identification and classification of all existing data (e.g. also data in sandboxes, old archives, test data, ...) -> deletion of all identified ROT (redundant, outdated, trivial) and generally unnecessary data (Franziska)

@moin-oss
Copy link
Contributor

moin-oss commented Nov 7, 2024

I will write some instructions for trying to keep data storage co-located with the processing and retrieval (in other words, use data centers nearest to where it is evaluated) @moin-oss

@f-mellinghoff
Copy link

f-mellinghoff commented Nov 8, 2024

First idea for instructions for the best practice "Identification and classification of all existing data -> deletion of all identified unnecessary data":

Green Software Playbooks – Data Engineering

Improvements to existing projects

Identify and classify all existing data and later delete all identified unnecessary data

Go over all available data storage in your project and document and classify existing data.
Order data into the following categories:

  • Important data for business use case (includes data needed for operations, legal purposes, legitimate backups, etc.)
  • ROT data – redundant, outdate or obsolete and trivial data, which is not needed for the project context. This includes all kinds of previous test data, personal data, unnecessary backups or historical data or any other kind of data which is not needed for a business use case.

Permanently delete all identified ROT and unneeded data from your data storage. If necessary adjust the corresponding loads to avoid the creation of new ROT data.
Establish a continuous process to keep monitoring your data storage, e.g. by having an analysis task which will run regularly and go through all your stored data.
Keep the documentation and classification of your data up to date.

Green IT Advantages: This helps you reduce the amount of data in your project and helps you avoid the existence of dark data. About 10% to 1/3 of the energy in data centres is used for data storage, so you can reduce your CO2 footprint, energy consumption and also cost by reducing the amount of data stored.

Considerations during setup of a new project

Identify and classify all existing data

Establish a process from the start to keep track of all the data in any data storage of your new project, e.g. by having an analysis task which will run regularly and go through all your stored data.
Classify and document all data which becomes part of your data storage to avoid the creation of ROT (redundant, outdate or obsolete and trivial) data.
Take care to keep the documentation and classification of your data up to date and to permanently delete all unneeded data.

Green IT Advantages: This helps you reduce the amount of data in your project and helps you avoid the existence of dark data. About 10% to 1/3 of the energy in data centres is used for data storage, so you can reduce your CO2 footprint, energy consumption and also cost by reducing the amount of data stored.

@moin-oss
Copy link
Contributor

moin-oss commented Dec 3, 2024

Draft set of directions for developers to look at the locations of their databases and processing servers:

Green Software Playbooks – Data Engineering

Improvements to existing projects

Using servers for processing in the same data center as the databases

Examine the existing arrangement of your project to determine if the databases are in the same data center as the servers that handle the processing of the data.

  • In the case where you use the same cloud provider for both the databases and the data processing servers it should be straightforward to migrate the servers to run in the same data center by doing a blue/green redeploy of the servers.
  • If it is the case that the processing servers use a different cloud provider than the databases this will be more difficult. You will need to determine if there is a data center for the server provider close to where the databases are hosted. Alternatively, it may also make sense to update the cloud provider for one to match the other.

Once the servers and the databases are setup to run in the same datacenter there should not be any modifications needed moving forward.

Green IT Advantages: As of 2017, the energy rate for transferring data over the internet amounted to 1.8kWh/GB https://www.wholegraindigital.com/blog/website-energy-consumption/. Given the often large quantities of data that are processed by event or data driven applications, this can amount to a significant amount of energy required to transfer data from databases to servers for processing. By keeping databases and servers within the same data center, the energy demand for data processing can be significantly reduced.

Considerations during setup of a new project

Using servers for processing in the same data center as the databases

When creating a new project the easiest solution is to use the same cloud provider for both databases and servers so that they can be both run in the same data center. Otherwise, it will be important to talk discuss with prospective providers the locations of their respective data centers so that the databases and servers are not too far apart.

Green IT Advantages: As of 2017, the energy rate for transferring data over the internet amounted to 1.8kWh/GB https://www.wholegraindigital.com/blog/website-energy-consumption/. Given the often large quantities of data that are processed by event or data driven applications, this can amount to a significant amount of energy required to transfer data from databases to servers for processing. By keeping databases and servers within the same data center, the energy demand for data processing can be significantly reduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

3 participants