You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary
This issue proposes adjustments to ensure accurate GPU power/efficiency measurements, especially when multiple GPUs are in use. The main goals are to:
Correctly aggregate power across all GPUs to avoid under-reporting total power usage.
Use the mean (instead of median) to calculate power draw over each run for better energy (Watt-min) accuracy.
Optionally refine token counting for inference when outputs are variable length, and refine throughput logging.
1. Sum Power Across All GPUs
Currently, only the first GPU’s power draw is recorded. This underestimates total energy if multiple GPUs are active. Proposed fix: In the logging step, iterate over all GPU metrics and sum their power_draw:
# gpu_metrics_utils.pydefcollect_power_draw_all_gpus():
metrics_list=get_gpu_metrics()
total_power=sum(m.power_drawforminmetrics_list) # Summation of power across GPUsreturntotal_power
Then, use collect_power_draw_all_gpus() in training/inference loops instead of get_gpu_metrics()[0].power_draw.
2. Use Mean Instead of Median for Power Draw
The code uses a groupby.median() approach to calculate power usage over a run. Although median reduces the influence of outliers, energy is fundamentally “(average power) × (time)”.
Proposed fix: Switch from median to mean in post-processing:
# process_experiment_data.py# Instead of groupby(...).median(), do:summaries=df.groupby('max_watt', as_index=False).mean()
# This ensures energy calculations (power * time) align with average power draw.
If you need outlier handling, consider filtering data points before the mean rather than switching to median.
When using model.generate(), not all sequences may reach the max length. Counting tokens as batch_size * seq_length can overestimate throughput.
Proposed fix: After generation, count actual output lengths:
# run_inference.pyforoutputsinmodel.generate(...):
actual_len=outputs.shape[-1] # e.g., number of tokens in each generated sequencetotal_tokens+=actual_len
Then compute tokens_per_second = total_tokens / elapsed_time. This yields more precise throughput numbers.
4. Optional Improvements
DistributedDataParallel: For large-scale training, consider migrating from DataParallel to DDP for better scalability and performance.
Logging: Consider per-batch or per-iteration throughput logging if you want a more granular view of performance changes over time.
Energy Per Token: For clarity, you might also log energy / total_tokens (Joules per token or Watt-min per token) to emphasize efficiency.
Expected Benefits
Accurate Energy Calculations: Summing power from all GPUs prevents under-reporting total usage.
Robust Averages: Using mean power correlates directly with time-based energy consumption.
Better Throughput Metrics: Accounting for actual tokens avoids skewing tokens/sec if generations finish early.
Scalability: Enhanced logging and multi-GPU handling support more accurate comparisons for different power limits.
via: o1-pro Deep Research
The text was updated successfully, but these errors were encountered:
Summary
This issue proposes adjustments to ensure accurate GPU power/efficiency measurements, especially when multiple GPUs are in use. The main goals are to:
1. Sum Power Across All GPUs
Currently, only the first GPU’s power draw is recorded. This underestimates total energy if multiple GPUs are active.
Proposed fix: In the logging step, iterate over all GPU metrics and sum their
power_draw
:Then, use collect_power_draw_all_gpus() in training/inference loops instead of get_gpu_metrics()[0].power_draw.
2. Use Mean Instead of Median for Power Draw
The code uses a groupby.median() approach to calculate power usage over a run. Although median reduces the influence of outliers, energy is fundamentally “(average power) × (time)”.
Proposed fix: Switch from median to mean in post-processing:
If you need outlier handling, consider filtering data points before the mean rather than switching to median.
3. Refine Variable-Length Inference Token Counting
When using model.generate(), not all sequences may reach the max length. Counting tokens as batch_size * seq_length can overestimate throughput.
Proposed fix: After generation, count actual output lengths:
Then compute tokens_per_second = total_tokens / elapsed_time. This yields more precise throughput numbers.
4. Optional Improvements
Expected Benefits
via: o1-pro Deep Research
The text was updated successfully, but these errors were encountered: