Skip to content

Carbon Free Energy estimates as an Impact Framework dataset #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #13 ...
adrianco opened this issue Oct 23, 2023 · 27 comments
Closed
Tracked by #13 ...

Carbon Free Energy estimates as an Impact Framework dataset #14

adrianco opened this issue Oct 23, 2023 · 27 comments
Assignees

Comments

@adrianco
Copy link
Contributor

adrianco commented Oct 23, 2023

Google publishes a table of CFE% which is a key piece of the information needed for this project, however the data is only disclosed for 2021 [update: there is GCP data for 2019-2022 on github] and as fas as I can tell, has not been disclosed by AWS or Azure.

For the RTC project, we need to define the data schema we would use to obtain data from any cloud provider, and find a mechanism that can manage the uncertainty in a current estimate, based on data from previous years.

https://cloud.google.com/sustainability/region-carbon - current content of this page is pasted below

Carbon free energy for Google Cloud regions

bookmark_border

In choosing which Google Cloud region to host your application, there are multiple considerations:

  • Latency to your end users can be different from one region to the next.
  • The price of services differs from region to region.
  • The electricity used to power your application might have a different carbon intensity.

This document explains how to include carbon emissions characteristics into the location choice for your Google Cloud services.

A carbon-free cloud for our customers

To power each Google Cloud region, we use electricity from the grid where the region is located. This electricity generates more or less carbon emissions (gCO2eq), depending on the type of power plants generating electricity for that grid and when we consume it. We recently set a goal to match our energy consumption with carbon-free energy (CFE), every hour and in every region by 2030. 

As we work towards our 2030 goal, we want to empower our customers to leverage our 24/7 carbon free energy efforts and consider the carbon impact of where they locate their applications. To characterize each region we use a metric: "CFE%". This metric is calculated for every hour and tells us what percentage of the energy we consumed during an hour that is carbon-free, based on two elements:

  1. The generation feeding the grid at that time (which power plants are running)
  2. Google-attributed clean energy produced onto that grid during that time. 

We aggregate the available average hourly CFE percentage for each Google Cloud region for the year and have provided 2021 data below.

Understanding the data

Google CFE%: This is the average percentage of carbon free energy consumed in a particular location on an hourly basis, while taking into account the investments we have made in carbon-free energy in that location. This means that in addition to the carbon free energy that's already supplied by the grid, we have added carbon-free energy generation in that location to reach our 24/7 carbon free energy objective. As a customer, this represents the average percentage of time your application will be running on carbon-free energy.

Grid carbon intensity (gCO2eq/kWh): This metric indicates the average operational gross emissions per unit of energy from the grid. This metric should be used to compare the regions in terms of carbon intensity of their electricity from the local grid. For regions that are similar in CFE%, this will indicate the relative emissions for when your workload is not running on carbon free energy.

Google Cloud net operational greenhouse gas (GHG) emissions: After calculating our Scope 2 market-based emissions per the GHG Protocol including our renewable energy contracts, Google ensures any remaining Scope 2 emissions are neutralized by investments in carbon offsets; this brings our global net operational emissions to zero.

Carbon data across GCP regions

Google Cloud Region Location Google CFE% Grid carbon intensity(gCO2eq/kWh) Google Cloudnet operational GHG emissions  
asia-east1 Taiwan 18% 453 0  
asia-east2 Hong Kong 28% 360 0  
asia-northeast1 Tokyo 16% 463 0  
asia-northeast2 Osaka 32% 383 0  
asia-northeast3 Seoul 31% 425 0  
asia-south1 Mumbai 24% 555 0  
asia-south2 Delhi 23% 632 0  
asia-southeast1 Singapore 4% 372 0  
asia-southeast2 Jakarta 13% 580 0  
australia-southeast1 Sydney 27% 538 0  
australia-southeast2 Melbourne 34% 490 0  
europe-central2 Warsaw 24% 738 0  
europe-north1 Finland 97% 112 0 Low CO2
europe-southwest1 Madrid 67% 160 0  
europe-west1 Belgium 80% 123 0 Low CO2
europe-west2 London 85% 166 0 Low CO2
europe-west3 Frankfurt 96% 413 0 Low CO2
europe-west4 Netherlands 57% 317 0  
europe-west6 Zurich 85% 118 0 Low CO2
europe-west8 Milan 42% 323 0  
europe-west9 Paris 87% 71 0 Low CO2
europe-west12 Turin 42% 323 0  
me-west1 Tel Aviv 2% 476 0  
northamerica-northeast1 Montréal 100% 0 0 Low CO2
northamerica-northeast2 Toronto 90% 36 0 Low CO2
southamerica-east1 São Paulo 89% 65 0 Low CO2
southamerica-west1 Santiago 90% 165 0 Low CO2
us-central1 Iowa 92% 445 0 Low CO2
us-east1 South Carolina 26% 532 0  
us-east4 Northern Virginia 60% 354 0  
us-east5 Columbus 60% 354 0  
us-south1 Dallas 41% 342 0  
us-west1 Oregon 89% 67 0 Low CO2
us-west2 Los Angeles 56% 202 0  
us-west3 Salt Lake City 31% 606 0  
us-west4 Las Vegas 27% 396 0  

* indicates that we do not currently have the hourly energy information available for calculating the metrics. For these regions, we will roll out the metrics once the hourly data becomes available.

Find the same data in a machine readable format on GitHub or as a BigQuery public dataset.

The hourly grid mix and carbon intensity data used to calculate these metrics is from Electricity Maps. This data has not been assured.

How to incorporate carbon free energy in your location strategy

Be sure to consider the other best practices for choosing resource locations like data residency requirements, latency to your end users, redundancy of the application, and price of the services available.

To use the CFE data above, here are some good ideas to get you started:

  1. Pick a cleaner region for your new applications. If you are going to run an application over time, running in the region with the highest CFE% will emit the lowest carbon emissions. 
  2. Run batch jobs on the cleanest option. Batch workloads often have the benefit of planning. You should pick the region with the highest CFE% available to you. 
  3. Set an organizational policy for low carbon locations. You can restrict the location of your resources to a particular Google Cloud region or subset of regions using the "Resource Location Restriction" organization policy. Dedicated "low carbon" value groups have been created to enable you to restrict locations with low carbon impact. For example, if you want to use only US-based regions, use the "Low carbon United States" (in:us-low-carbon-locations) value group.

Low carbon indicators

Some location pages on the Google Cloud website and location selectors in the Google Cloud console display " leaf icon Low CO2" next to locations that have the lowest carbon impact. The "Resource Location Restriction" organization policy offers "low carbon" value groups.

For a location to be considered "low carbon", it must belong to a region with a Google CFE% of at least 75%, or, if CFE% information is not available, a grid carbon intensity of maximum 200 gCO2eq/kWh.



@adrianco
Copy link
Contributor Author

Given data from a previous year, the data could be improved by updating the value for grid carbon intensity for that specific region with a current value, this sets a maximum carbon level. The CFE% from private power purchases is available for a previous year. To estimate forward to today, the CFE% will get better whenever new private generation capacity comes online, will change up or down whenever the grid changes its carbon intensity, and could get worse as the capacity of the region increases energy demand. We could estimate a wide CFE% range based on old data to bracket the possible outcomes, or the cloud provider could publish a narrower CFE% range based on their internal knowledge of growth in consumption vs. PPA projects and REC purchases.

The Google data is based on a 24x7 hourly algorithm. Carbon data is currently published by all the cloud providers on a monthly basis, so CFE% or a range could also be published monthly. Final CFE% would be published annually a few months after the year ends, and the range would converge to a single value at this point.

So the request to the cloud providers would be to publish interim CFE% estimates on a monthly basis during the year, with past months closing the range to zero when data is final, and with a two month forward estimate.

@adrianco
Copy link
Contributor Author

Clicking through the link to live data shows that this web page is out of date, and there is annual data for 2019-2022 available, updated a few months ago. https://github.com/GoogleCloudPlatform/region-carbon-info

This shows a range year on year of some regions slipping by a few % and some improving a lot.

@adrianco
Copy link
Contributor Author

adrianco commented Oct 24, 2023

The Bigtable view of the data has more details, but seems to omit the 2022 dataset.

Screenshot 2023-10-23 at 5 09 15 PM

It does however provide an initial schema definition that looks like a useful basis to start with.

Screenshot 2023-10-23 at 5 12 56 PM

@adrianco
Copy link
Contributor Author

adrianco commented Oct 24, 2023

Proposed schema for GSF RTC CFE
year, month, hour (optional), resolution, cfe_region, zone_id, grid_carbon_intensity, cloud_region, location, cloud_provider, cfe_low, cfe, cfe_high

This adds monthly and optional hourly resolution, with a flag indicating the resolution of the underlying data, records the grid carbon intensity used as a basis by the cloud provider, abstracts cloud_provider into its own metric, and replaces google_cfe with the low , the most probable value and the high value. The low and high values are intended to be a 95% confidence interval. Current GCP data would be yearly with 24x7_hourly resolution. Current AWS data would be yearly with yearly resolution. If hour by hour data was shared via a real time API, the hour metric would be provided.

@adrianco
Copy link
Contributor Author

AWS published a list of regions that are 95% renewable for 2021 and a larger list that are 100% renewable for 2022. The differences between AWS and GCP (as far as I can tell) are that GCP uses carbon offsets to zero out the remainder of its carbon emission on an hourly basis after adding in its PPAs and RECs to the grid mix. AWS uses local grid PPAs and RECs to buy 100% renewable electricity on an annual basis, but doesn't mention carbon offsets. AWS also has a lot more renewable energy generation projects in Asia than GCP.

@adrianco
Copy link
Contributor Author

I haven't been able to figure out a source for CFE for Azure. Can someone from Microsoft comment?

@seanmcilroy29 seanmcilroy29 mentioned this issue Oct 24, 2023
13 tasks
@seanmcilroy29
Copy link
Contributor

@tmcclell - are you able to give some insight on this?

@adrianco
Copy link
Contributor Author

Given the above schema, data would be shared via public BigTable, S3 or Azure Blob objects, and could be updated when the annual report is published, and also whenever a new renewable energy production facility comes online. There's often a PR story around the commitment to build, and the final power on of each facility, that could be tied to a data update.

@adrianco
Copy link
Contributor Author

I haven't been able to figure out a source for CFE for Azure. Can someone from Microsoft comment?

On our discussion call today, Ritesh confirmed that Azure does not publish CFE%

@adrianco
Copy link
Contributor Author

An initial step for the RTC project could be to publish a CFE% table on GitHub based on the data we have available now, but making a best estimate of what the end result would look like for all the cloud providers combined. Later, if the cloud providers supply more or new data, it could be blended in. This data isn't expected to update rapidly.

@seanmcilroy29 seanmcilroy29 mentioned this issue Nov 6, 2023
14 tasks
@seanmcilroy29 seanmcilroy29 mentioned this issue Dec 5, 2023
15 tasks
@adrianco
Copy link
Contributor Author

Azure CFE% is published for each region (along with PUE) in their Datacenter Facts info https://datacenters.microsoft.com/globe/fact-sheets. The CFE data for Google, Azure and AWS has been extracted into a Google Sheet and lined up as much as possible, and it is proposed that this be exported via the Impact Framework https://github.com/Green-Software-Foundation/if

@adrianco
Copy link
Contributor Author

The current sheet of raw data is here - this contains guesses for current ranges and is a work in progress at this point
https://docs.google.com/spreadsheets/d/1RKjD4CuI5bd7JTj-9Mi1-ZhTIc6OW7TH9hUvLPbLsPA/edit?usp=sharing

@seanmcilroy29 seanmcilroy29 mentioned this issue Dec 19, 2023
17 tasks
@seanmcilroy29
Copy link
Contributor

#8 - Purchased Renewable Energy is not settled for a year - overlaps with this issue

@adrianco adrianco changed the title Define the data sources needed to generate up to date Carbon Free Energy estimates Carbon Free Energy estimates as an Impact Framework dataset Dec 20, 2023
@seanmcilroy29 seanmcilroy29 mentioned this issue Jan 2, 2024
16 tasks
@adrianco
Copy link
Contributor Author

adrianco commented Jan 9, 2024

I have restructured the spreadsheet and simplified it a bit. I removed the 2023 estimates that I had made. I moved hourly and annual data to individual columns and removed the column that was tagging the data. Now it contains only the actual published data coming from cloud providers.

There is a column for marginal carbon, should we try to populate it, or add our own interpolations or publish just the pure data from cloud providers?

There is a lot of missing data, should we leave it blank, or populate with NA so that people don't take zero values for blank?

We need to review this and decide if it's ready to publish as an Impact Framework model.
https://docs.google.com/spreadsheets/d/1RKjD4CuI5bd7JTj-9Mi1-ZhTIc6OW7TH9hUvLPbLsPA/edit#gid=0

@adrianco
Copy link
Contributor Author

adrianco commented Jan 9, 2024

The Impact Framework uses a hyphenated lower case naming strategy, so I changed all the column headers to match that.

The SCI-o (operational) model requires grid-carbon-intensity as it's input, so we need to define required inputs, which I have color coded red, required output color coded green, and information output color coded blue.

inputs:
year ("2019", "2020", "2021", "2022" are currently valid)
cloud-provider ("Google Cloud", "Amazon Web Services", "Microsoft Azure" are currently valid)
cloud-region ("us-east1", "us-east-1", "eastus" format unique to the cloud provider)

output:
grid-carbon-intensity (numeric, grams of CO2e/kWh)
various other informational metrics

I've calculated effective grid-carbon-intensity for Google, given that the data is location based which SCI-o requires.
We need to decide what if anything to output for AWS and Azure

@adrianco
Copy link
Contributor Author

adrianco commented Jan 9, 2024

For Azure, we could use the annual data, and reduce the value given by electricity maps (or whattime, but the google data is sourced from EM) by the CFE ratio. For Amazon if we did the same we would return zero for most of the regions. In both cases, this isn't really the data that SCI-o wants and isn't comparable to Google data.

@adrianco
Copy link
Contributor Author

Maybe we have an optional input value "market" which returns the market method numbers for Azure and AWS, and NA for Google.

@seanmcilroy29 seanmcilroy29 mentioned this issue Jan 12, 2024
15 tasks
@jawache
Copy link

jawache commented Jan 16, 2024

Hi, @adrianco reviewed the above. Are the following assumptions correct?

  • You are looking for a way to adjust carbon emissions to consider market-based measures like renewable purchases.
  • The approach you are proposing is to adjust the grid-carbon-intensity value by the per-region coefficient of CFE% (Something is ringing in my head that we are missing something and CFE% can't be used in this way, I might just need to think it through, and work on an example)
  • The XL includes some inputs of cloud region, vendor, and outputs of cfe and an adjusted grid-carbon intensity (yearly values I assume)
  • We are going to maintain these values in a CSV format manually.
  • I can see in the CSV that some data is quite old. Can we assume that if you request a CFE value, we just return the latest if we don't have data for that year?

Given the above, I believe the goal is to create a model that adjusts (reduces) a grid-carbon intensity value to take into account the CFE% of a region. Downstream models will then adjust the final carbon emissions value with this new grid-carbon-intensity so it represents the investment into 24/7 renewable purchases by that cloud provider.

I propose this is split into several impact framework plugins, we have a philosophy that each plugin does one thing so we can mix and match plugins in different pipelines.

carbon-free-energy plugin (or real-time-cloud plugin)

  • this is where we maintain the CSV data above.
  • the inputs it needs are cloud vendor and cloud region, and default inputs are also timestamp and duration, and optionally grid-carbon-intensity.
  • it outputs the latest carbon-free-energy figure used by that cloud vendor in that region.
  • if the timestamp is for a year for which the data is not present in the CSV if in the past, it will assume 0% CFE; if in the future, it will use the latest CFE that is in the CSV.
  • If the input already contains grid-carbon-intensity, then it uses that value. Otherwise, it takes the grid-carbon-intensity value from the CSV. (In most use cases in the IF the grid-carbon-intensity would already be provided, we have a wattime plugin and plans for an em plugin etc...)
  • Adjusts the grid-carbon-intensity value so it reflects the cfe value from the CSV.
  • If there was a previous grid-carbon-intensity value then it would also copy that to another field so we at least have a record of the non-cfe-grid-carbon-intensity.

If the above is roughly correct, I will spec out a proper plugin spec in the format we use in the impact framework to ensure that nothing is left to assumptions, with clear inputs and expected outputs.

NOTE: I suspect we should do a name change in IF, not use grid-carbon-intensity and instead use electricity-carbon-intensity instead. If you are adjusting a grid (location-based) carbon intensity with some market-based measures, then the term grid isn't accurate anymore.

Potential pipeline

pipeline:
  - teads-curve # compute energy from utilization
  - watttime # to get grid-carbon-intensity
  - carbon-free-energy # to adjust grid-carbon-intensity w.r.t. the cfe for that cloud region
  - sci-o # to compute carbon from energy + grid-carbon-intensity

@seanmcilroy29
Copy link
Contributor

Project members agreed to create a document guideline to explain the headers for GSF Real-Time Cloud Renewable Energy Percentage

Collaboration Google doc to be used for drafting prior to adding to GitHub

@adrianco
Copy link
Contributor Author

We discussed the document today and after the meeting I created a first draft of the documentation document. I re-ordered and re-named the columns in the sheet to rationalize them and had a first pass at explaining what each metric means and where it comes from. The sheet still needs to have more data filled in, then have missing data marked with NA.

@adrianco
Copy link
Contributor Author

Hi, @adrianco reviewed the above. Are the following assumptions correct?

I think this is close, but we are thinking a bit differently about how it would fit in

  • You are looking for a way to adjust carbon emissions to consider market-based measures like renewable purchases.
  • The approach you are proposing is to adjust the grid-carbon-intensity value by the per-region coefficient of CFE% (Something is ringing in my head that we are missing something and CFE% can't be used in this way, I might just need to think it through, and work on an example)

Correct, it can't be used in this way, and still be a compliant location based carbon estimate.

  • The XL includes some inputs of cloud region, vendor, and outputs of cfe and an adjusted grid-carbon intensity (yearly values I assume)
  • We are going to maintain these values in a CSV format manually.
  • I can see in the CSV that some data is quite old. Can we assume that if you request a CFE value, we just return the latest if we don't have data for that year?

Data will always be between 6 and 18 months in the past. A separate IF model step should be used to estimate current data.

Given the above, I believe the goal is to create a model that adjusts (reduces) a grid-carbon intensity value to take into account the CFE% of a region. Downstream models will then adjust the final carbon emissions value with this new grid-carbon-intensity so it represents the investment into 24/7 renewable purchases by that cloud provider.

The goal is to get all the information about a cloud provider region in a consistent format that can be used for various purposes. We aren't going to invent a new methodology that isn't a valid model.

I propose this is split into several impact framework plugins, we have a philosophy that each plugin does one thing so we can mix and match plugins in different pipelines.

This plugin gets all the cloud region data, that's all. It should be at the front of the pipeline for workloads running in the cloud.

carbon-free-energy plugin (or real-time-cloud plugin)

  • this is where we maintain the CSV data above.
  • the inputs it needs are cloud vendor and cloud region, and default inputs are also timestamp
    yes, just these three

and duration, and optionally grid-carbon-intensity.
It won't use duration, and it should be earlier in the pipeline before grid-carbon-intensity is obtained

  • it outputs the latest carbon-free-energy figure used by that cloud vendor in that region.
    Yes, along with other data about that region
  • if the timestamp is for a year for which the data is not present in the CSV if in the past, it will assume 0% CFE; if in the future, it will use the latest CFE that is in the CSV.
    Yes, that works.
  • If the input already contains grid-carbon-intensity, then it uses that value. Otherwise, it takes the grid-carbon-intensity value from the CSV. (In most use cases in the IF the grid-carbon-intensity would already be provided, we have a wattime plugin and plans for an em plugin etc...)

No I think it goes before the wattime or em plugin. It outputs the EM or WT key that can be used to pull current data for that grid region, given only the cloud provider and region.

  • Adjusts the grid-carbon-intensity value so it reflects the cfe value from the CSV.
    no
  • If there was a previous grid-carbon-intensity value then it would also copy that to another field so we at least have a record of the non-cfe-grid-carbon-intensity.
    no need. It would output an annual average grid-carbon-intensity, that could be used directly, or could be refined to a more accurate grid-carbon-intensity for a more specific time period by calling a wt or em plugin.

If the above is roughly correct, I will spec out a proper plugin spec in the format we use in the impact framework to ensure that nothing is left to assumptions, with clear inputs and expected outputs.

NOTE: I suspect we should do a name change in IF, not use grid-carbon-intensity and instead use electricity-carbon-intensity instead. If you are adjusting a grid (location-based) carbon intensity with some market-based measures, then the term grid isn't accurate anymore.

we aren't doing that.

Potential pipeline

pipeline:
  - teads-curve # compute energy from utilization
  - watttime # to get grid-carbon-intensity
  - carbon-free-energy # to adjust grid-carbon-intensity w.r.t. the cfe for that cloud region
  - sci-o # to compute carbon from energy + grid-carbon-intensity

I think it looks like this

pipeline:
  - teads-curve # compute energy from utilization
  - cloud-region # look up the cloud region info
  - watttime # optional to get grid-carbon-intensity for now rather than annual data on google (other clouds are NA)
  - sci-o # to compute carbon from energy + grid-carbon-intensity (should also use PUE in the calculation)

There are other uses for the CFE data, perhaps in a tool that picks an optimal region for you.

@adrianco
Copy link
Contributor Author

Given a tool like https://gcping.com to find the regions that are closest to someone, the cloud-region data could be used to pick the best CFE that is nearby.

@jawache
Copy link

jawache commented Jan 31, 2024

Gotcha thanks @adrianco, that's clear and simple.

The IF team are doing a rearch sprint next two weeks so will hold off on writing a spec till that is complete.

The IF team themselves will be flat out till April. What would you say about specing this out in some detail and then sharing it with hackathon participants, see if any of them are interested in taking it up?

@jawache
Copy link

jawache commented Feb 29, 2024

@adrianco and @seanmcilroy29, part of the work above has been slated for the next sprint in IF, just the location style fields for now, see IF (view)

@adrianco to be more general purpose I've added in a geolocation field, if we can support a lat,lon that would make this data be more useful when using services outside of em or wt. The IF team will make the effort to compute this using whatever is the central lat,lon of the location field. Let me know if there is a better altnerative?

@adrianco
Copy link
Contributor Author

Summary document header names rationalized and copied to spreadsheet.
Spreadsheet tidied up, year color coding filled out.
NA added to grid-carbon-intensity output that will be consumed by SCI for AWS and Azure
Still need to fill out some columns with geolocation data, EM and WT and IEA reference information.

@adrianco
Copy link
Contributor Author

adrianco commented Mar 26, 2024

New detailed issues to complete this data source
#31 - add geolocation data
#32 - add Watttime data
#33 - add Electricitymaps data
#34 - figure out IEC data and add it
#35 - cfe region and any other issues

@adrianco
Copy link
Contributor Author

adrianco commented Jul 2, 2024

Data finalized in cloud region metadata proposal

@adrianco adrianco closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants