-
Notifications
You must be signed in to change notification settings - Fork 312
evalv3: OOM and signal killed (64GB) #3334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I wrote this quick script to see if I could find anything obvious, but it doesn't appear that any individual directory causes this. It seems maybe to be a culmination of all of them? There are some pretty large spikes in memory usage. ❯ for dir in */; echo "$dir:" && CUE_EXPERIMENT=evalv3 go run cuelang.org/go/cmd/cue@v0.10.0-alpha.2.0.20240726151100-e68bd632c29e export "./$dir" Some of those spikes are like 10 gigabytes.... |
If it helps - the latest stable release without using evalv3 takes at most 2GB for everything all at once ( ❯ cue version
cue version 0.9.2
go version go1.22.5
-buildmode pie
-compiler gc
-trimpath true
DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
CGO_ENABLED 1
GOARCH amd64
GOOS linux
GOAMD64 v1
cue.lang.version v0.9.2 |
I don't know if this pprof info is helpful.
|
Some updated numbers, as of CUE 655236e:
We see a similar issue with a few other medium to large projects when using the new evaluator. The current work in #2854 and particularly #2853 should help significantly here. I will update this thread once more progress has been made, as we are actively merging improvements into master. |
I have finally been able to reduce a bit of CUE which seems to OOM my machine where evalv2 definitely did not: #3509 |
Still seems to happen as of 0005c22. |
I again reduced part of this project into another performance issue to be fixed: #3633 |
Here is another reduction, similar to #3633, yet not the same:
As of 0b36c4e, this takes 80ms to |
Significant progress; I am now able to finish evaluating with evalv3, even if it's still quite a bit slower than evalv2:
With @mpvl we believe the remaining work here is memory management, which evalv2 did to some extent, but evalv3 does not do yet. This is also why evalv3 above takes more than 20GiB of memory to finish. |
I ran this again, as of the latest version of the automata and cue repos, and got the following stats:
|
I think this issue overlaps with #3881 too; the pprof shows me:
So, even though we waste at least 25% on GC overhead from allocations, we waste at least 46% on the |
Benchmarking the reduction at https://cuelang.org/issue/3334 via benchcmd -n 12 CueVetIssue3334 cue vet -c f.cue: │ old │ new │ │ sec/op │ sec/op vs base │ CueVetIssue3334 175.3m ± 1% 154.9m ± 1% -11.67% (p=0.000 n=12) │ old │ new │ │ user-sec/op │ user-sec/op vs base │ CueVetIssue3334 332.1m ± 4% 270.8m ± 5% -18.46% (p=0.000 n=12) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ CueVetIssue3334 27.61m ± 16% 21.92m ± 30% ~ (p=0.089 n=12) │ old │ new │ │ peak-RSS-bytes │ peak-RSS-bytes vs base │ CueVetIssue3334 182.2Mi ± 4% 177.4Mi ± 1% -2.61% (p=0.000 n=12) Updates #3334. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I1f066de341e8fd2e9d508304e3de1b86234f60a6 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213055 Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com>
The getReplacements call was creating an entirely new slice each time, as can be seen by the inline perf TODO. We can however avoid most of that by reusing a local buffer, not needing to alter nodeContext or OpContext at all just yet. As measured by benchcmd -n 6 VetAutomataList cue vet -c k8s/magiclove/list.cue on a simplified version of https://github.com/uhthomas/automata, we see a significant win: │ old │ new │ │ sec/op │ sec/op vs base │ VetAutomataList 3.001 ± 1% 2.877 ± 2% -4.12% (p=0.002 n=6) │ old │ new │ │ user-sec/op │ user-sec/op vs base │ VetAutomataList 3.979 ± 6% 3.769 ± 6% -5.29% (p=0.026 n=6) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ VetAutomataList 99.79m ± 11% 93.96m ± 8% ~ (p=0.132 n=6) │ old │ new │ │ peak-RSS-bytes │ peak-RSS-bytes vs base │ VetAutomataList 789.5Mi ± 4% 743.2Mi ± 6% -5.87% (p=0.026 n=6) Updates #3334. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Ia598d49e9ddd13c687315fc1726160c7f6af8800 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213493 Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
In looking at why https://github.com/uhthomas/automata was spending so much time in checkTypos, I realised that we were doing a lot of work in the initial appendRequired call, even though the main v.Arcs loop did nothing as len(v.Arcs) == 0. Detect that edge case early, which can be very common in some projects, and avoid doing any further work. As measured by benchcmd -n 6 VetAutomataList cue vet -c k8s/magiclove/list.cue on a simplified version of https://github.com/uhthomas/automata, we see a very substantial win: │ old │ new │ │ sec/op │ sec/op vs base │ VetAutomataList 2877.5m ± 2% 722.1m ± 2% -74.91% (p=0.002 n=6) │ old │ new │ │ user-sec/op │ user-sec/op vs base │ VetAutomataList 3.769 ± 6% 1.231 ± 9% -67.34% (p=0.002 n=6) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ VetAutomataList 93.96m ± 8% 79.04m ± 15% -15.88% (p=0.015 n=6) │ old │ new │ │ peak-RSS-bytes │ peak-RSS-bytes vs base │ VetAutomataList 743.2Mi ± 6% 672.5Mi ± 3% -9.51% (p=0.002 n=6) The full automata project also sees a similar speed-up, going from about 180s total time to evaluate to about 30s. Updates #3334. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I24c9a7bdac496608799fc72f6abd4c49eaf32180 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213494 TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com>
With https://review.gerrithub.io/c/cue-lang/cue/+/1213494, we are now looking much better, even if still a bit slower:
Memory usage still peaked above 20GiB, but at least the wall time and CPU usage is much lower now. |
nodeContext's overlays and disjunct fields were added and set in https://cuelang.org/cl/1190800 but were never used. Peak memory usage on a simplified version of https://github.com/uhthomas/automata goes down a bit, as most of the allocations come from cloning vertices and contexts, so the size of these types is multiplied: │ old │ new │ │ peak-RSS-bytes │ peak-RSS-bytes vs base │ VetAutomataList 669.5Mi ± 3% 629.0Mi ± 3% -6.06% (p=0.000 n=8) Updates #3334. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I7ebf8db1f72cee7e4246ad10c4c0d9864ea3110e Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213518 Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
walkVisitor was a wrapper around ast.Walk which kept a stack of visitors to be saved and reused as we walked up and down a syntax tree. Resolve also used a trick to only walk parts of some syntax nodes. Rather than returning a visitor for a node, it would recursively call walkVisitor only for the fields that it wanted to walk into. However, these two compounded poorly; each recursive call to walkVisitor caused new allocations to make a new stackVisitor to carry state. Even though these allocations are small, they can add up into noticeable overhead, particularly for the garbage collector. Instead, we can rely on ast.Walk rather than walkVisitor, while keeping the same trick to only walk parts of certain syntax nodes. This keeps the same semantics, but avoids the extra allocations. │ old │ new │ │ B/op │ B/op vs base │ VetAutomataList 734.9Mi ± 0% 733.3Mi ± 0% -0.22% (p=0.000 n=10) │ old │ new │ │ allocs/op │ allocs/op vs base │ VetAutomataList 3.424M ± 0% 3.361M ± 0% -1.85% (p=0.000 n=10) For #3334. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Ibb84e22bf7cfdf8c6e2c6122ab21aed9934c13cc Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1214080 TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Matthew Sackman <matthew@cue.works> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
slices.Clone is an efficient way to make a copy of a slice, but it does create a new slice from scratch each time. In this case we are looping and each copy is only used as part of each iteration, so we can preallocate one slice big enough to make the copies and reuse that across all iterations. │ old │ new │ │ B/op │ B/op vs base │ VetAutomataList 733.3Mi ± 0% 663.1Mi ± 0% -9.57% (p=0.000 n=10) │ old │ new │ │ allocs/op │ allocs/op vs base │ VetAutomataList 3.361M ± 0% 3.308M ± 0% -1.58% (p=0.000 n=10) For #3334. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Id12844a4c5f417d095521783bfc929985f23ade2 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1214081 Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Matthew Sackman <matthew@cue.works>
apd.Decimal.Int64 on `d = hi - lo` will error if it overflows an int64. This is pretty common with CUE bounds like int64, which expands to: >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807 Constructing that error is unfortunate as it allocates a few times and stringifies the number too, which also has a cost. Which is entirely unnecessary, as we don't use the error value at all. If we know the integer will have more than one digit, give up early. │ old │ new │ │ B/op │ B/op vs base │ VetAutomataList 663.1Mi ± 0% 659.2Mi ± 0% -0.59% (p=0.000 n=10) │ old │ new │ │ allocs/op │ allocs/op vs base │ VetAutomataList 3.308M ± 0% 3.163M ± 0% -4.37% (p=0.000 n=10) Note that this requires us to move the d.Negative check earlier, as it should not be affected by the number of digits in d. Introduce the helper errIncompatibleBounds to avoid duplication. Combining this and the two previous allocation optimizations, we can see a noticeable drop in the wall time: │ old │ new │ │ sec/op │ sec/op vs base │ VetAutomataList 642.4m ± 3% 618.7m ± 1% -3.69% (p=0.000 n=10) │ old │ new │ │ B/op │ B/op vs base │ VetAutomataList 734.9Mi ± 0% 659.3Mi ± 0% -10.29% (p=0.000 n=10) │ old │ new │ │ allocs/op │ allocs/op vs base │ VetAutomataList 3.424M ± 0% 3.163M ± 0% -7.63% (p=0.000 n=10) For #3334. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Ic0ba6031b0ace66bc1418ce76e745650021d22a0 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1214082 Reviewed-by: Matthew Sackman <matthew@cue.works> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
What version of CUE are you using (
cue version
)?Does this issue reproduce with the latest stable release?
Yes.
What did you do?
After following this to fix another issue, I thought it would be interesting to try the new evaluator again.
#3330 (comment)
❯ CUE_EXPERIMENT=evalv3 go run cuelang.org/go/cmd/cue@v0.10.0-alpha.2.0.20240726151100-e68bd632c29e export ./k8s/amour/list.cue signal: killed
uhthomas/automata@5a1a4fe
What did you expect to see?
Successful build.
What did you see instead?
OOM, signal killed.
The text was updated successfully, but these errors were encountered: