SplitwiseSimCPUCarbon is a discreet event simulator that helps to evaluate CPU embodied carbon optimization in LLM inference clusters through management of CPU aging. It was built by extending SplitwiseSim, an LLM serving cluster simulator.
SplitwiseSimCPUCarbon also implements our research on an efficient embodied carbon amortization technique for LLM inference clusters. It implements online algorithms of the proposed technique, comparison baselines, and evaluation.
We designed the splitwise-sim-cpu-carbon repository to reflect our extension on top of SplitwiseSim. You can follow the steps below to set up SplitwiseSimCPUCarbon accordingly.
- Check out the base SplitWiseSim code. For that, download the tag from
https://github.com/tharindu-b-hewage/splitwise-sim/releases/tag/base-cpu-aging-aware,
or clone the forked repository https://github.com/tharindu-b-hewage/splitwise-sim and checkout to the tag
base-cpu-aging-aware
. - Follow SplitwiseSim instructions to download inference trace data and run example simulation scenarios.
- Apply the patch: extension/splitwise-sim.patch, which extends the base to
SplitwiseSimCPUCarbon
.
Below is a high-level summary of the new or modified functionality introduced with the patch. These changes primarily focus on adding CPU modeling (including per-core allocation, C-State power/temperature modeling, and new scheduling algorithms) to manage silicon aging in the CPU during LLM model serving.
-
CPU and Core Modeling
- Core Power & Residency Tracking
- New modules (
core_power.py
andcore_residency.py
) provide a detailed CPU model, including:- Per-core frequency, temperature, and aging effects.
- Idle-state (C-State) transitions and associated wake-up latencies.
- New modules (
- Core Objects and Sleep/Wake Mechanisms
- Each
CPU
processor is divided intoCore
objects tracking frequency, temperature, current task, forced sleep, etc. - A “sleep manager” adjusts which cores are forced to sleep or woken up based on cluster load and a configurable reaction function.
- Each
- Core Power & Residency Tracking
-
Extended Configuration and Hardware Definitions
- CPU-Based SKUs
- YAML definitions added for servers that include both CPU(s) and GPU(s), e.g.
dgx-a100-with-cpu.yaml
,dgx-h100-with-cpu.yaml
, etc. - Parallel “variants” (e.g.,
dgx-h100-with-cpu-vm40
,dgx-h100-with-cpu-vm80
) model different CPU core counts.
- YAML definitions added for servers that include both CPU(s) and GPU(s), e.g.
- New CPU Processor YAMLs
- Added for Intel (dual-xeon-platinum) and AMD (dual-amd-rome) CPUs with multiple core-count configurations.
- CPU-Based SKUs
-
Instance and Executor Adjustments
- CPU Overheads for Common Operations
- Memory allocation, task dispatch, and general “executor” overhead now model CPU usage.
- Instances track a dedicated CPU object and factor in CPU overhead when scheduling tasks.
- CPU Task Scheduling
- Task management techniques in LLM inference clusters, including
Task-to-core Mapping
algorithm in our proposed technique. task_schedule_linux
,task_schedule_least_aged
, andtask_schedule_proposed
illustrate distinct policies for core selection.- A new
cpu_configs.properties
file lets you switch among scheduling algorithms (linux
,least-aged
, orproposed
).
- Task management techniques in LLM inference clusters, including
- CPU Overheads for Common Operations
-
Simulator Hooks
- Periodic Sleep-Management
- When simulating with
proposed
CPU scheduling, the simulator periodically callscpu.adjust_sleeping_cores()
to executeSelective Core Idling
algorithm in our proposed technique.
- When simulating with
- CPU Core Usage Logs
- End-of-simulation triggers a final “state update” to log each core’s status (frequency, temperature, etc.).
- Periodic Sleep-Management
-
Plotting & Analysis Scripts
- Several new Python files (
llm-ca_misc_plots.py
,llm-ca_perf_metric_plots.py
,llm-ca_plots_tasks-vs-time.py
) to:- Parse and plot CPU usage, tasks-per-core distributions, frequency aging, etc.
- Generate visualizations for reaction functions, core availability, and frequency drop over time.
- Several new Python files (
-
New Experiment Scripts
run_cpu_experiments.sh
- Automates the simulation runs across multiple VM configurations (varying CPU core counts) and scheduling techniques.
- Refined “splitwise” Scripts
- Extended scripts (
run_splitwise_ha_cpu.sh
, etc.) configure cluster servers to include CPUs alongside GPUs, enabling CPU-based overhead.
- Extended scripts (
Overall, these modifications integrate an aging-aware CPU model into the simulator, allowing fine-grained exploration of CPU aging through CPU overhead, power states, core-level scheduling, and the interplay between CPU and GPU resources in LLM inference clusters.
The run_cpu_experiments.sh
script executes various configurations of inference traces, CPU management techniques,
and instance core counts to conduct multiple LLM service experiments. Make sure to change the experiment data output folder
accordingly. Upon execution, you can refer to the plotting and analysis scripts mentioned in the above section. Modify
the script to point to experiment data properly. These scripts generate CPU aging management and carbon optimization
plots in the results_cpu
folder.
If you use SplitwiseSimCPUCarbon in your work, please cite the accompanying paper:
@misc{hewage2025agingawarecpucoremanagement,
title={Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference},
author={Tharindu B. Hewage and Shashikant Ilager and Maria Rodriguez Read and Rajkumar Buyya},
year={2025},
eprint={2501.15829},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2501.15829},
}