investigate allocations #683

juliasloan25 · 2024-03-08T21:49:20Z

When we try to run the DYAMOND configuration on central's P100 GPUs, it fails because there isn't enough memory available during the atmos_init call. The same run works fine on clima's A100 GPUs, but in atmos_init we see Effective GPU memory usage: 87.32% (69.114 GiB/79.150 GiB). 70GB memory usage is a lot, so we need to look into where these allocations are coming from.

We can do this by placing CUDA.memory_status calls throughout the code to see where the allocations jump

The text was updated successfully, but these errors were encountered:

juliasloan25 · 2024-05-02T17:49:00Z

Coupler output table shows very similar allocations between atmos-only and coupled simulations, as of 5/1 (on GPU):
coupled simulation allocations: 3.361 GiB
atmos-only simulation allocations: 3.255 GiB

(on CPU):
coupled CoupledSimulation object allocations: 0.196 GiB
atmos-only CoupledSimulation object allocations: 0.195 GiB

juliasloan25 added 🍃 leaf Issue coupled to a PR GPU labels Mar 8, 2024

juliasloan25 added this to the O1.2.5 (coupler) Atmos-land simulations on GPU milestone Mar 8, 2024

juliasloan25 self-assigned this Mar 8, 2024

This was referenced Mar 8, 2024

O1.2.5 Atmos-land simulations on GPU at 1 SYPD on 4 A100s #390

Closed

add GPU DYAMOND runs #659

Merged

look into allocations #691

Closed

LenkaNovak mentioned this issue Mar 14, 2024

Improve slack report #693

Closed

12 tasks

juliasloan25 closed this as completed May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

investigate allocations #683

investigate allocations #683

juliasloan25 commented Mar 8, 2024

juliasloan25 commented May 2, 2024

investigate allocations #683

investigate allocations #683

Comments

juliasloan25 commented Mar 8, 2024

juliasloan25 commented May 2, 2024