Skip to content

evalv3 performance regression which disappears with CUE_DEBUG=opendef #3881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mvdan opened this issue Apr 8, 2025 · 2 comments
Open

evalv3 performance regression which disappears with CUE_DEBUG=opendef #3881

mvdan opened this issue Apr 8, 2025 · 2 comments
Labels
closedness evaluator evalv3 issues affecting only the evaluator version 3 roadmap/performance

Comments

@mvdan
Copy link
Member

mvdan commented Apr 8, 2025

# evalv2
env CUE_EXPERIMENT=evalv3=0
exec cue vet -c

# evalv3
env CUE_EXPERIMENT=evalv3=1
exec cue vet -c

# evalv3 with opendef
env CUE_EXPERIMENT=evalv3=1
env CUE_DEBUG=opendef=1
exec cue vet -c

-- input.cue --
package p

import "list"

out: #Output & {#input: _input}

_input: {
	for n in list.Range(1, 300, 1) {
		"A\(n)": "B\(n)": name: "name\(n)"
	}
}
#Embed: [string]: string
#Output: {
	#Embed
	#input: _

	for _, lvlA in #input {
		for nameB, lvlB in lvlA {
			(nameB): lvlB.name
		}
	}
}

As of e058014:

# evalv2 (0.037s)
> env CUE_EXPERIMENT=evalv3=0
> exec cue vet -c
# evalv3 (0.644s)
> env CUE_EXPERIMENT=evalv3=1
> exec cue vet -c
# evalv3 with opendef (0.036s)
> env CUE_EXPERIMENT=evalv3=1
> env CUE_DEBUG=opendef=1
> exec cue vet -c

You can see that evalv2 runs in about 40ms, evalv3 runs in 650ms (more than a 10x slow-down), but evalv3 with opendef runs in 40ms once again.

Distilled from #3879.

@mvdan mvdan added closedness evaluator evalv3 issues affecting only the evaluator version 3 roadmap/performance labels Apr 8, 2025
@mvdan
Copy link
Member Author

mvdan commented Apr 8, 2025

Increasing the list.Range above to 500 to get a bigger CPU profile, I see the following culprit:

(pprof) top
Showing nodes accounting for 2.18s, 96.89% of 2.25s total
Dropped 18 nodes (cum <= 0.01s)
Showing top 10 nodes out of 47
      flat  flat%   sum%        cum   cum%
     1.23s 54.67% 54.67%      1.25s 55.56%  cuelang.org/go/internal/core/adt.transitiveMapping
     0.56s 24.89% 79.56%      0.56s 24.89%  cuelang.org/go/internal/core/adt.(*reqSets).filterTop.func1 (inline)
     0.21s  9.33% 88.89%      0.77s 34.22%  cuelang.org/go/internal/core/adt.(*reqSets).filterSets (inline)

So I believe this TODO is to blame:

// TODO: this is a polynomial algorithm. At some point, if this turns out to be
// a bottleneck, we should consider filtering unnecessary entries and/or using a
// more efficient algorithm.
func transitiveMapping(buf reqSets, x reqSet, b []replaceID) reqSets {

cueckoo pushed a commit that referenced this issue Apr 11, 2025
LogEval is zero by default, and its implementation via Indentf has
a non-zero cost which I spotted via pprof, so avoid calling the funcs
entirely in all the call sites. Some already skipped via an early
LogEval check, but others did not.

Benchmarking the code from https://cuelang.org/issue/3881
via benchcmd -n 8 CueVetIssue3881 cue vet -c f.cue:

                    │     old     │                new                │
                    │   sec/op    │   sec/op     vs base              │
    CueVetIssue3881   507.4m ± 0%   496.0m ± 1%  -2.24% (p=0.000 n=8)

                    │     old     │                new                │
                    │ user-sec/op │ user-sec/op  vs base              │
    CueVetIssue3881   571.9m ± 1%   560.0m ± 1%  -2.08% (p=0.003 n=8)

                    │      old      │              new              │
                    │  sys-sec/op   │  sys-sec/op   vs base         │
    CueVetIssue3881   12.159m ± 51%   9.510m ± 58%  ~ (p=0.234 n=8)

                    │      old       │               new               │
                    │ peak-RSS-bytes │ peak-RSS-bytes  vs base         │
    CueVetIssue3881     47.95Mi ± 3%     49.42Mi ± 4%  ~ (p=0.857 n=8)

Updates #3881.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I8ec0215239e97f6e8aeaeb50196b42910d91a815
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213054
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
@mvdan mvdan moved this from Backlog to v0.13.0-rc.1 in Release v0.13 Apr 14, 2025
cueckoo pushed a commit that referenced this issue Apr 17, 2025
Issue #3881

Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com>
Change-Id: I8dea8779b3fe23eaaea688f72a62749928eeb4db
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213606
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Reviewed-by: Matthew Sackman <matthew@cue.works>
cueckoo pushed a commit that referenced this issue Apr 17, 2025
This should reduce the amount of work done by
a factor of the depth of a tree.

This doesn't change the essence of the algorithm.

Issue #3881

Signed-off-by: Marcel van Lohuizen <mpvl@gmail.com>
Change-Id: I2ad81c04b28b64b9736463489039806a4493e5bd
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213607
Reviewed-by: Matthew Sackman <matthew@cue.works>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
@mvdan
Copy link
Member Author

mvdan commented Apr 17, 2025

Below are the perf numbers from #3879 (cue export in https://github.com/joelMuehlena/cue-netbox-example), from which I reduced this issue above. All wall times on my Linux laptop running on battery.

FYI @joelMuehlena we made some progress here, but more is clearly needed, so we're leaving this issue open for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closedness evaluator evalv3 issues affecting only the evaluator version 3 roadmap/performance
Projects
Status: v0.13.0-rc.1
Development

No branches or pull requests

1 participant