2024-11-07 - Green Software Playbooks agenda #18

bryaki02 · 2024-11-07T07:32:29Z

Date

2024-11-07 - HH:MM UTC - See the time in your timezone https://everytimezone.com

Roll Call

Please add a comment to this issue during the meeting to denote attendance.
Any untracked attendees will be added by the GSF team below:

Full Name, Affiliation, (optional) GitHub username

Previous Meeting

Notes from the previous meeting: ...

Agenda

Convene & Roll Call
Review the agenda and suggest new agenda points
What are some common Data Engineering projects and ideas for practices? #15 (comment)
AOB, Q&A & Adjourn

OKR & KPI updates

Any Other Business

Co-Chair seats open

moin-oss · 2024-11-07T15:23:20Z

We want to differentiate between improvements to existing projects and creating a new project or adding features.

Let's create a data engineer directory and add some example playbook instructions based on one (or more) of the practices listed in this discussion What are some common Data Engineering projects and ideas for practices? #15 (comment)

f-mellinghoff · 2024-11-07T15:36:01Z

Write some first draft for the following best practice idea: Identification and classification of all existing data (e.g. also data in sandboxes, old archives, test data, ...) -> deletion of all identified ROT (redundant, outdated, trivial) and generally unnecessary data (Franziska)

moin-oss · 2024-11-07T15:37:22Z

I will write some instructions for trying to keep data storage co-located with the processing and retrieval (in other words, use data centers nearest to where it is evaluated) @moin-oss

f-mellinghoff · 2024-11-08T07:45:19Z

First idea for instructions for the best practice "Identification and classification of all existing data -> deletion of all identified unnecessary data":

Green Software Playbooks – Data Engineering

Improvements to existing projects

Identify and classify all existing data and later delete all identified unnecessary data

Go over all available data storage in your project and document and classify existing data.
Order data into the following categories:

Important data for business use case (includes data needed for operations, legal purposes, legitimate backups, etc.)
ROT data – redundant, outdate or obsolete and trivial data, which is not needed for the project context. This includes all kinds of previous test data, personal data, unnecessary backups or historical data or any other kind of data which is not needed for a business use case.

Permanently delete all identified ROT and unneeded data from your data storage. If necessary adjust the corresponding loads to avoid the creation of new ROT data.
Establish a continuous process to keep monitoring your data storage, e.g. by having an analysis task which will run regularly and go through all your stored data.
Keep the documentation and classification of your data up to date.

Green IT Advantages: This helps you reduce the amount of data in your project and helps you avoid the existence of dark data. About 10% to 1/3 of the energy in data centres is used for data storage, so you can reduce your CO2 footprint, energy consumption and also cost by reducing the amount of data stored.

Considerations during setup of a new project

Identify and classify all existing data

Establish a process from the start to keep track of all the data in any data storage of your new project, e.g. by having an analysis task which will run regularly and go through all your stored data.
Classify and document all data which becomes part of your data storage to avoid the creation of ROT (redundant, outdate or obsolete and trivial) data.
Take care to keep the documentation and classification of your data up to date and to permanently delete all unneeded data.

Green IT Advantages: This helps you reduce the amount of data in your project and helps you avoid the existence of dark data. About 10% to 1/3 of the energy in data centres is used for data storage, so you can reduce your CO2 footprint, energy consumption and also cost by reducing the amount of data stored.

moin-oss · 2024-12-03T23:47:57Z

Draft set of directions for developers to look at the locations of their databases and processing servers:

Green Software Playbooks – Data Engineering

Improvements to existing projects

Using servers for processing in the same data center as the databases

Examine the existing arrangement of your project to determine if the databases are in the same data center as the servers that handle the processing of the data.

In the case where you use the same cloud provider for both the databases and the data processing servers it should be straightforward to migrate the servers to run in the same data center by doing a blue/green redeploy of the servers.
If it is the case that the processing servers use a different cloud provider than the databases this will be more difficult. You will need to determine if there is a data center for the server provider close to where the databases are hosted. Alternatively, it may also make sense to update the cloud provider for one to match the other.

Once the servers and the databases are setup to run in the same datacenter there should not be any modifications needed moving forward.

Green IT Advantages: As of 2017, the energy rate for transferring data over the internet amounted to 1.8kWh/GB https://www.wholegraindigital.com/blog/website-energy-consumption/. Given the often large quantities of data that are processed by event or data driven applications, this can amount to a significant amount of energy required to transfer data from databases to servers for processing. By keeping databases and servers within the same data center, the energy demand for data processing can be significantly reduced.

Considerations during setup of a new project

Using servers for processing in the same data center as the databases

When creating a new project the easiest solution is to use the same cloud provider for both databases and servers so that they can be both run in the same data center. Otherwise, it will be important to talk discuss with prospective providers the locations of their respective data centers so that the databases and servers are not too far apart.

Green IT Advantages: As of 2017, the energy rate for transferring data over the internet amounted to 1.8kWh/GB https://www.wholegraindigital.com/blog/website-energy-consumption/. Given the often large quantities of data that are processed by event or data driven applications, this can amount to a significant amount of energy required to transfer data from databases to servers for processing. By keeping databases and servers within the same data center, the energy demand for data processing can be significantly reduced.

bryaki02 added the agenda label Nov 7, 2024

github-project-automation bot added this to Green Software Playbooks Nov 7, 2024

github-project-automation bot moved this to Todo in Green Software Playbooks Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024-11-07 - Green Software Playbooks agenda #18

2024-11-07 - Green Software Playbooks agenda #18

bryaki02 commented Nov 7, 2024 •

edited

Loading

moin-oss commented Nov 7, 2024 •

edited

Loading

f-mellinghoff commented Nov 7, 2024

moin-oss commented Nov 7, 2024

f-mellinghoff commented Nov 8, 2024 •

edited

Loading

moin-oss commented Dec 3, 2024 •

edited

Loading

2024-11-07 - Green Software Playbooks agenda #18

2024-11-07 - Green Software Playbooks agenda #18

Comments

bryaki02 commented Nov 7, 2024 • edited Loading

Date

Roll Call

Previous Meeting

Agenda

OKR & KPI updates

Any Other Business

moin-oss commented Nov 7, 2024 • edited Loading

f-mellinghoff commented Nov 7, 2024

moin-oss commented Nov 7, 2024

f-mellinghoff commented Nov 8, 2024 • edited Loading

Green Software Playbooks – Data Engineering

Improvements to existing projects

Identify and classify all existing data and later delete all identified unnecessary data

Considerations during setup of a new project

Identify and classify all existing data

moin-oss commented Dec 3, 2024 • edited Loading

Green Software Playbooks – Data Engineering

Improvements to existing projects

Using servers for processing in the same data center as the databases

Considerations during setup of a new project

Using servers for processing in the same data center as the databases

bryaki02 commented Nov 7, 2024 •

edited

Loading

moin-oss commented Nov 7, 2024 •

edited

Loading

f-mellinghoff commented Nov 8, 2024 •

edited

Loading

moin-oss commented Dec 3, 2024 •

edited

Loading