-
Notifications
You must be signed in to change notification settings - Fork 5
Carbon Free Energy estimates as an Impact Framework dataset #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Given data from a previous year, the data could be improved by updating the value for grid carbon intensity for that specific region with a current value, this sets a maximum carbon level. The CFE% from private power purchases is available for a previous year. To estimate forward to today, the CFE% will get better whenever new private generation capacity comes online, will change up or down whenever the grid changes its carbon intensity, and could get worse as the capacity of the region increases energy demand. We could estimate a wide CFE% range based on old data to bracket the possible outcomes, or the cloud provider could publish a narrower CFE% range based on their internal knowledge of growth in consumption vs. PPA projects and REC purchases. The Google data is based on a 24x7 hourly algorithm. Carbon data is currently published by all the cloud providers on a monthly basis, so CFE% or a range could also be published monthly. Final CFE% would be published annually a few months after the year ends, and the range would converge to a single value at this point. So the request to the cloud providers would be to publish interim CFE% estimates on a monthly basis during the year, with past months closing the range to zero when data is final, and with a two month forward estimate. |
Clicking through the link to live data shows that this web page is out of date, and there is annual data for 2019-2022 available, updated a few months ago. https://github.com/GoogleCloudPlatform/region-carbon-info This shows a range year on year of some regions slipping by a few % and some improving a lot. |
Proposed schema for GSF RTC CFE This adds monthly and optional hourly resolution, with a flag indicating the resolution of the underlying data, records the grid carbon intensity used as a basis by the cloud provider, abstracts cloud_provider into its own metric, and replaces google_cfe with the low , the most probable value and the high value. The low and high values are intended to be a 95% confidence interval. Current GCP data would be yearly with 24x7_hourly resolution. Current AWS data would be yearly with yearly resolution. If hour by hour data was shared via a real time API, the hour metric would be provided. |
AWS published a list of regions that are 95% renewable for 2021 and a larger list that are 100% renewable for 2022. The differences between AWS and GCP (as far as I can tell) are that GCP uses carbon offsets to zero out the remainder of its carbon emission on an hourly basis after adding in its PPAs and RECs to the grid mix. AWS uses local grid PPAs and RECs to buy 100% renewable electricity on an annual basis, but doesn't mention carbon offsets. AWS also has a lot more renewable energy generation projects in Asia than GCP. |
I haven't been able to figure out a source for CFE for Azure. Can someone from Microsoft comment? |
@tmcclell - are you able to give some insight on this? |
Given the above schema, data would be shared via public BigTable, S3 or Azure Blob objects, and could be updated when the annual report is published, and also whenever a new renewable energy production facility comes online. There's often a PR story around the commitment to build, and the final power on of each facility, that could be tied to a data update. |
On our discussion call today, Ritesh confirmed that Azure does not publish CFE% |
An initial step for the RTC project could be to publish a CFE% table on GitHub based on the data we have available now, but making a best estimate of what the end result would look like for all the cloud providers combined. Later, if the cloud providers supply more or new data, it could be blended in. This data isn't expected to update rapidly. |
Azure CFE% is published for each region (along with PUE) in their Datacenter Facts info https://datacenters.microsoft.com/globe/fact-sheets. The CFE data for Google, Azure and AWS has been extracted into a Google Sheet and lined up as much as possible, and it is proposed that this be exported via the Impact Framework https://github.com/Green-Software-Foundation/if |
The current sheet of raw data is here - this contains guesses for current ranges and is a work in progress at this point |
#8 - Purchased Renewable Energy is not settled for a year - overlaps with this issue |
I have restructured the spreadsheet and simplified it a bit. I removed the 2023 estimates that I had made. I moved hourly and annual data to individual columns and removed the column that was tagging the data. Now it contains only the actual published data coming from cloud providers. There is a column for marginal carbon, should we try to populate it, or add our own interpolations or publish just the pure data from cloud providers? There is a lot of missing data, should we leave it blank, or populate with NA so that people don't take zero values for blank? We need to review this and decide if it's ready to publish as an Impact Framework model. |
The Impact Framework uses a hyphenated lower case naming strategy, so I changed all the column headers to match that. The SCI-o (operational) model requires grid-carbon-intensity as it's input, so we need to define required inputs, which I have color coded red, required output color coded green, and information output color coded blue. inputs: output: I've calculated effective grid-carbon-intensity for Google, given that the data is location based which SCI-o requires. |
For Azure, we could use the annual data, and reduce the value given by electricity maps (or whattime, but the google data is sourced from EM) by the CFE ratio. For Amazon if we did the same we would return zero for most of the regions. In both cases, this isn't really the data that SCI-o wants and isn't comparable to Google data. |
Maybe we have an optional input value "market" which returns the market method numbers for Azure and AWS, and NA for Google. |
Hi, @adrianco reviewed the above. Are the following assumptions correct?
Given the above, I believe the goal is to create a model that adjusts (reduces) a grid-carbon intensity value to take into account the CFE% of a region. Downstream models will then adjust the final carbon emissions value with this new grid-carbon-intensity so it represents the investment into 24/7 renewable purchases by that cloud provider. I propose this is split into several impact framework plugins, we have a philosophy that each plugin does one thing so we can mix and match plugins in different pipelines. carbon-free-energy plugin (or real-time-cloud plugin)
If the above is roughly correct, I will spec out a proper plugin spec in the format we use in the impact framework to ensure that nothing is left to assumptions, with clear inputs and expected outputs. NOTE: I suspect we should do a name change in IF, not use grid-carbon-intensity and instead use electricity-carbon-intensity instead. If you are adjusting a grid (location-based) carbon intensity with some market-based measures, then the term grid isn't accurate anymore. Potential pipeline pipeline:
- teads-curve # compute energy from utilization
- watttime # to get grid-carbon-intensity
- carbon-free-energy # to adjust grid-carbon-intensity w.r.t. the cfe for that cloud region
- sci-o # to compute carbon from energy + grid-carbon-intensity |
Project members agreed to create a document guideline to explain the headers for GSF Real-Time Cloud Renewable Energy Percentage Collaboration Google doc to be used for drafting prior to adding to GitHub |
We discussed the document today and after the meeting I created a first draft of the documentation document. I re-ordered and re-named the columns in the sheet to rationalize them and had a first pass at explaining what each metric means and where it comes from. The sheet still needs to have more data filled in, then have missing data marked with NA. |
I think this is close, but we are thinking a bit differently about how it would fit in
Correct, it can't be used in this way, and still be a compliant location based carbon estimate.
Data will always be between 6 and 18 months in the past. A separate IF model step should be used to estimate current data.
The goal is to get all the information about a cloud provider region in a consistent format that can be used for various purposes. We aren't going to invent a new methodology that isn't a valid model.
This plugin gets all the cloud region data, that's all. It should be at the front of the pipeline for workloads running in the cloud.
No I think it goes before the wattime or em plugin. It outputs the EM or WT key that can be used to pull current data for that grid region, given only the cloud provider and region.
we aren't doing that.
I think it looks like this pipeline:
- teads-curve # compute energy from utilization
- cloud-region # look up the cloud region info
- watttime # optional to get grid-carbon-intensity for now rather than annual data on google (other clouds are NA)
- sci-o # to compute carbon from energy + grid-carbon-intensity (should also use PUE in the calculation) There are other uses for the CFE data, perhaps in a tool that picks an optimal region for you. |
Given a tool like https://gcping.com to find the regions that are closest to someone, the cloud-region data could be used to pick the best CFE that is nearby. |
Gotcha thanks @adrianco, that's clear and simple. The IF team are doing a rearch sprint next two weeks so will hold off on writing a spec till that is complete. The IF team themselves will be flat out till April. What would you say about specing this out in some detail and then sharing it with hackathon participants, see if any of them are interested in taking it up? |
@adrianco and @seanmcilroy29, part of the work above has been slated for the next sprint in IF, just the location style fields for now, see IF (view) @adrianco to be more general purpose I've added in a geolocation field, if we can support a lat,lon that would make this data be more useful when using services outside of em or wt. The IF team will make the effort to compute this using whatever is the central lat,lon of the location field. Let me know if there is a better altnerative? |
Summary document header names rationalized and copied to spreadsheet. |
Data finalized in cloud region metadata proposal |
Google publishes a table of CFE% which is a key piece of the information needed for this project, however the data is only disclosed for 2021 [update: there is GCP data for 2019-2022 on github] and as fas as I can tell, has not been disclosed by AWS or Azure.
For the RTC project, we need to define the data schema we would use to obtain data from any cloud provider, and find a mechanism that can manage the uncertainty in a current estimate, based on data from previous years.
https://cloud.google.com/sustainability/region-carbon - current content of this page is pasted below
Carbon free energy for Google Cloud regions
bookmark_borderIn choosing which Google Cloud region to host your application, there are multiple considerations:
This document explains how to include carbon emissions characteristics into the location choice for your Google Cloud services.
A carbon-free cloud for our customers
To power each Google Cloud region, we use electricity from the grid where the region is located. This electricity generates more or less carbon emissions (gCO2eq), depending on the type of power plants generating electricity for that grid and when we consume it. We recently set a goal to match our energy consumption with carbon-free energy (CFE), every hour and in every region by 2030.
As we work towards our 2030 goal, we want to empower our customers to leverage our 24/7 carbon free energy efforts and consider the carbon impact of where they locate their applications. To characterize each region we use a metric: "CFE%". This metric is calculated for every hour and tells us what percentage of the energy we consumed during an hour that is carbon-free, based on two elements:
We aggregate the available average hourly CFE percentage for each Google Cloud region for the year and have provided 2021 data below.
Understanding the data
Google CFE%: This is the average percentage of carbon free energy consumed in a particular location on an hourly basis, while taking into account the investments we have made in carbon-free energy in that location. This means that in addition to the carbon free energy that's already supplied by the grid, we have added carbon-free energy generation in that location to reach our 24/7 carbon free energy objective. As a customer, this represents the average percentage of time your application will be running on carbon-free energy.
Grid carbon intensity (gCO2eq/kWh): This metric indicates the average operational gross emissions per unit of energy from the grid. This metric should be used to compare the regions in terms of carbon intensity of their electricity from the local grid. For regions that are similar in CFE%, this will indicate the relative emissions for when your workload is not running on carbon free energy.
Google Cloud net operational greenhouse gas (GHG) emissions: After calculating our Scope 2 market-based emissions per the GHG Protocol including our renewable energy contracts, Google ensures any remaining Scope 2 emissions are neutralized by investments in carbon offsets; this brings our global net operational emissions to zero.
Carbon data across GCP regions
* indicates that we do not currently have the hourly energy information available for calculating the metrics. For these regions, we will roll out the metrics once the hourly data becomes available.
Find the same data in a machine readable format on GitHub or as a BigQuery public dataset.
The hourly grid mix and carbon intensity data used to calculate these metrics is from Electricity Maps. This data has not been assured.
How to incorporate carbon free energy in your location strategy
Be sure to consider the other best practices for choosing resource locations like data residency requirements, latency to your end users, redundancy of the application, and price of the services available.
To use the CFE data above, here are some good ideas to get you started:
in:us-low-carbon-locations
) value group.Low carbon indicators
Some location pages on the Google Cloud website and location selectors in the Google Cloud console display "
Low CO2" next to locations that have the lowest carbon impact. The "Resource Location Restriction" organization policy offers "low carbon" value groups.
For a location to be considered "low carbon", it must belong to a region with a Google CFE% of at least 75%, or, if CFE% information is not available, a grid carbon intensity of maximum 200 gCO2eq/kWh.
The text was updated successfully, but these errors were encountered: