evalv3 performance regression which disappears with CUE_DEBUG=opendef #3881

mvdan · 2025-04-08T11:47:09Z

# evalv2
env CUE_EXPERIMENT=evalv3=0
exec cue vet -c

# evalv3
env CUE_EXPERIMENT=evalv3=1
exec cue vet -c

# evalv3 with opendef
env CUE_EXPERIMENT=evalv3=1
env CUE_DEBUG=opendef=1
exec cue vet -c

-- input.cue --
package p

import "list"

out: #Output & {#input: _input}

_input: {
	for n in list.Range(1, 300, 1) {
		"A\(n)": "B\(n)": name: "name\(n)"
	}
}
#Embed: [string]: string
#Output: {
	#Embed
	#input: _

	for _, lvlA in #input {
		for nameB, lvlB in lvlA {
			(nameB): lvlB.name
		}
	}
}

As of e058014:

# evalv2 (0.037s)
> env CUE_EXPERIMENT=evalv3=0
> exec cue vet -c
# evalv3 (0.644s)
> env CUE_EXPERIMENT=evalv3=1
> exec cue vet -c
# evalv3 with opendef (0.036s)
> env CUE_EXPERIMENT=evalv3=1
> env CUE_DEBUG=opendef=1
> exec cue vet -c

You can see that evalv2 runs in about 40ms, evalv3 runs in 650ms (more than a 10x slow-down), but evalv3 with opendef runs in 40ms once again.

Distilled from #3879.

The text was updated successfully, but these errors were encountered:

mvdan · 2025-04-08T16:25:37Z

Increasing the list.Range above to 500 to get a bigger CPU profile, I see the following culprit:

(pprof) top
Showing nodes accounting for 2.18s, 96.89% of 2.25s total
Dropped 18 nodes (cum <= 0.01s)
Showing top 10 nodes out of 47
      flat  flat%   sum%        cum   cum%
     1.23s 54.67% 54.67%      1.25s 55.56%  cuelang.org/go/internal/core/adt.transitiveMapping
     0.56s 24.89% 79.56%      0.56s 24.89%  cuelang.org/go/internal/core/adt.(*reqSets).filterTop.func1 (inline)
     0.21s  9.33% 88.89%      0.77s 34.22%  cuelang.org/go/internal/core/adt.(*reqSets).filterSets (inline)

So I believe this TODO is to blame:

// TODO: this is a polynomial algorithm. At some point, if this turns out to be
// a bottleneck, we should consider filtering unnecessary entries and/or using a
// more efficient algorithm.
func transitiveMapping(buf reqSets, x reqSet, b []replaceID) reqSets {

LogEval is zero by default, and its implementation via Indentf has a non-zero cost which I spotted via pprof, so avoid calling the funcs entirely in all the call sites. Some already skipped via an early LogEval check, but others did not. Benchmarking the code from https://cuelang.org/issue/3881 via benchcmd -n 8 CueVetIssue3881 cue vet -c f.cue: │ old │ new │ │ sec/op │ sec/op vs base │ CueVetIssue3881 507.4m ± 0% 496.0m ± 1% -2.24% (p=0.000 n=8) │ old │ new │ │ user-sec/op │ user-sec/op vs base │ CueVetIssue3881 571.9m ± 1% 560.0m ± 1% -2.08% (p=0.003 n=8) │ old │ new │ │ sys-sec/op │ sys-sec/op vs base │ CueVetIssue3881 12.159m ± 51% 9.510m ± 58% ~ (p=0.234 n=8) │ old │ new │ │ peak-RSS-bytes │ peak-RSS-bytes vs base │ CueVetIssue3881 47.95Mi ± 3% 49.42Mi ± 4% ~ (p=0.857 n=8) Updates #3881. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I8ec0215239e97f6e8aeaeb50196b42910d91a815 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213054 TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>

Issue #3881 Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com> Change-Id: I8dea8779b3fe23eaaea688f72a62749928eeb4db Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213606 TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> Reviewed-by: Matthew Sackman <matthew@cue.works>

This should reduce the amount of work done by a factor of the depth of a tree. This doesn't change the essence of the algorithm. Issue #3881 Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com> Change-Id: I2ad81c04b28b64b9736463489039806a4493e5bd Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213607 Reviewed-by: Matthew Sackman <matthew@cue.works> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>

mvdan · 2025-04-17T09:57:16Z

Below are the perf numbers from #3879 (cue export in https://github.com/joelMuehlena/cue-netbox-example), from which I reduced this issue above. All wall times on my Linux laptop running on battery.

evalv2: 15s
evalv3 before https://review.gerrithub.io/c/cue-lang/cue/+/1213607: 47s
evalv3 after https://review.gerrithub.io/c/cue-lang/cue/+/1213607: 34s (so a ~28% improvement)
evalv3 with CUE_DEBUG=opendef, i.e. if closedness was infinitely efficient: 3s

FYI @joelMuehlena we made some progress here, but more is clearly needed, so we're leaving this issue open for now.

mvdan added closedness evaluator evalv3 issues affecting only the evaluator version 3 roadmap/performance labels Apr 8, 2025

mvdan mentioned this issue Apr 8, 2025

evalv3: OOM and signal killed (64GB) #3334

Open

mvdan added this to Release v0.13 Apr 14, 2025

github-project-automation bot moved this to Backlog in Release v0.13 Apr 14, 2025

mvdan moved this from Backlog to v0.13.0-rc.1 in Release v0.13 Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evalv3 performance regression which disappears with CUE_DEBUG=opendef #3881

evalv3 performance regression which disappears with CUE_DEBUG=opendef #3881

mvdan commented Apr 8, 2025

mvdan commented Apr 8, 2025

mvdan commented Apr 17, 2025

evalv3 performance regression which disappears with CUE_DEBUG=opendef #3881

evalv3 performance regression which disappears with CUE_DEBUG=opendef #3881

Comments

mvdan commented Apr 8, 2025

mvdan commented Apr 8, 2025

mvdan commented Apr 17, 2025