Skip to content

evalv3: OOM and signal killed (64GB) #3334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
uhthomas opened this issue Jul 28, 2024 · 13 comments
Open

evalv3: OOM and signal killed (64GB) #3334

uhthomas opened this issue Jul 28, 2024 · 13 comments
Labels
evaluator evalv3 issues affecting only the evaluator version 3 NeedsInvestigation unity-win bugs found thanks to projects added to Unity

Comments

@uhthomas
Copy link
Contributor

uhthomas commented Jul 28, 2024

What version of CUE are you using (cue version)?

$ go run cuelang.org/go/cmd/cue@v0.10.0-alpha.2.0.20240726151100-e68bd632c29e version
cue version v0.10.0-alpha.2.0.20240726151100-e68bd632c29e

go version go1.22.5
      -buildmode exe
       -compiler gc
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
cue.lang.version v0.10.0

Does this issue reproduce with the latest stable release?

Yes.

What did you do?

After following this to fix another issue, I thought it would be interesting to try the new evaluator again.

#3330 (comment)

❯ CUE_EXPERIMENT=evalv3 go run cuelang.org/go/cmd/cue@v0.10.0-alpha.2.0.20240726151100-e68bd632c29e export ./k8s/amour/list.cue
signal: killed

uhthomas/automata@5a1a4fe

What did you expect to see?

Successful build.

What did you see instead?

OOM, signal killed.

image

@uhthomas uhthomas added NeedsInvestigation Triage Requires triage/attention labels Jul 28, 2024
uhthomas added a commit to uhthomas/automata that referenced this issue Jul 28, 2024
@uhthomas
Copy link
Contributor Author

uhthomas commented Jul 28, 2024

I wrote this quick script to see if I could find anything obvious, but it doesn't appear that any individual directory causes this. It seems maybe to be a culmination of all of them? There are some pretty large spikes in memory usage.

for dir in */; echo "$dir:" && CUE_EXPERIMENT=evalv3 go run cuelang.org/go/cmd/cue@v0.10.0-alpha.2.0.20240726151100-e68bd632c29e export "./$dir"

image

Some of those spikes are like 10 gigabytes....

@uhthomas
Copy link
Contributor Author

If it helps - the latest stable release without using evalv3 takes at most 2GB for everything all at once (cue export ./k8s/amour/list.cue). Pretty major regression here.

❯ cue version
cue version 0.9.2

go version go1.22.5
      -buildmode pie
       -compiler gc
       -trimpath true
  DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
cue.lang.version v0.9.2

@uhthomas
Copy link
Contributor Author

uhthomas commented Jul 28, 2024

I don't know if this pprof info is helpful.

pprof001

❯ go tool pprof http://localhost:8080/debug/pprof/heap
Fetching profile over HTTP from http://localhost:8080/debug/pprof/heap
Saved profile in /home/thomas/pprof/pprof.cuebin.alloc_objects.alloc_space.inuse_objects.inuse_space.011.pb.gz
File: cuebin
Type: inuse_space
Time: Jul 28, 2024 at 10:41pm (BST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 15.18GB, 97.93% of 15.51GB total
Dropped 187 nodes (cum <= 0.08GB)
Showing top 10 nodes out of 54
      flat  flat%   sum%        cum   cum%
    6.28GB 40.51% 40.51%     6.28GB 40.51%  cuelang.org/go/internal/core/adt.(*overlayContext).allocCC
    3.88GB 25.04% 65.55%     5.95GB 38.36%  cuelang.org/go/internal/core/adt.(*closeContext).getKeyedCC
    2.17GB 13.97% 79.52%     2.17GB 13.97%  cuelang.org/go/internal/core/adt.(*OpContext).newNodeContext
    1.10GB  7.12% 86.64%     1.10GB  7.12%  cuelang.org/go/internal/core/adt.(*closeContext).addDependency
    0.96GB  6.21% 92.85%     5.99GB 38.60%  cuelang.org/go/internal/core/adt.(*closeContext).assignConjunct
    0.29GB  1.86% 94.71%     0.29GB  1.86%  cuelang.org/go/internal/core/adt.(*overlayContext).initCloneCC
    0.18GB  1.16% 95.87%     0.18GB  1.16%  cuelang.org/go/internal/core/adt.(*Vertex).rootCloseContext
    0.12GB   0.8% 96.67%     1.28GB  8.26%  cuelang.org/go/internal/core/adt.(*overlayContext).cloneVertex
    0.10GB  0.66% 97.32%     0.10GB  0.66%  cuelang.org/go/internal/core/adt.(*nodeContext).getArc
    0.09GB  0.61% 97.93%     0.10GB  0.66%  cuelang.org/go/internal/core/adt.(*nodeContext).scheduleTask
(pprof) list allocCC
Total: 15.51GB
ROUTINE ======================== cuelang.org/go/internal/core/adt.(*overlayContext).allocCC in /home/thomas/code/github.com/cue-lang/cue/internal/core/adt/overlay.go
    6.28GB    15.11GB (flat, cum) 97.42% of Total
         .          .    259:func (ctx *overlayContext) allocCC(cc *closeContext) *closeContext {
         .          .    260:   // TODO(perf): if the original is "done", it can no longer be modified and
         .          .    261:   // we can use the original, even if the values will not be correct.
         .          .    262:   if cc.overlay != nil {
         .          .    263:           return cc.overlay
         .          .    264:   }
         .          .    265:
    4.59GB     4.59GB    266:   o := &closeContext{generation: ctx.generation}
         .          .    267:   cc.overlay = o
         .          .    268:
         .          .    269:   if cc.parent != nil {
         .     6.15GB    270:           o.parent = ctx.allocCC(cc.parent)
         .          .    271:   }
         .          .    272:
         .          .    273:   // Copy the conjunct group if it exists.
         .          .    274:   if cc.group != nil {
         .          .    275:           // Copy the group of conjuncts.
    1.67GB     1.67GB    276:           g := make([]Conjunct, len(*cc.group))
         .          .    277:           o.group = (*ConjunctGroup)(&g)
         .          .    278:           for i, c := range *cc.group {
         .     1.50MB    279:                   g[i] = ctx.copyConjunct(c)
         .          .    280:           }
         .          .    281:
         .          .    282:           if o.parent != nil {
         .          .    283:                   // validate invariants
         .          .    284:                   ca := *cc.parent.group
         .          .    285:                   if ca[cc.parentIndex].x != cc.group {
         .          .    286:                           panic("group misaligned")
         .          .    287:                   }
         .          .    288:
         .          .    289:                   (*o.parent.group)[cc.parentIndex].x = o.group
         .          .    290:           }
         .          .    291:   }
         .          .    292:
         .          .    293:   // This must come after allocating the parent so that we can always read
         .          .    294:   // the src vertex from the parent during initialization. This assumes that
         .          .    295:   // src is set in the root closeContext when cloning a vertex.
   21.23MB    21.23MB    296:   ctx.closeContexts = append(ctx.closeContexts, cc)
         .          .    297:
         .          .    298:   // needsCloseInSchedule is used as a boolean. The pointer to the original
         .          .    299:   // closeContext is just used for reporting purposes.
         .          .    300:   if cc.needsCloseInSchedule != nil {
         .   128.57MB    301:           o.needsCloseInSchedule = ctx.allocCC(cc.needsCloseInSchedule)
         .          .    302:   }
         .          .    303:
         .          .    304:   // We only explicitly tag dependencies of type ARC. Notifications that
         .          .    305:   // point within the disjunct overlay will be tagged elsewhere.
         .          .    306:   for _, a := range cc.arcs {
         .          .    307:           if a.kind == ARC {
         .     2.55GB    308:                   ctx.allocCC(a.cc)
         .          .    309:           }
         .          .    310:   }
         .          .    311:
         .          .    312:   return o
         .          .    313:}
(pprof) list getKeyedCC
Total: 15.51GB
ROUTINE ======================== cuelang.org/go/internal/core/adt.(*closeContext).getKeyedCC in /home/thomas/code/github.com/cue-lang/cue/internal/core/adt/fields.go
    3.88GB    10.72GB (flat, cum) 69.15% of Total
         .          .    362:func (cc *closeContext) getKeyedCC(ctx *OpContext, key *closeContext, c CycleInfo, mode ArcType, checkClosed bool) *closeContext {
         .          .    363:   for _, a := range cc.arcs {
         .          .    364:           if a.key == key {
         .          .    365:                   a.cc.updateArcType(mode)
         .          .    366:                   return a.cc
         .          .    367:           }
         .          .    368:   }
         .          .    369:
  360.01MB   360.01MB    370:   group := &ConjunctGroup{}
         .          .    371:
         .          .    372:   if cc.parentConjuncts == cc {
         .          .    373:           panic("parent is self")
         .          .    374:   }
         .          .    375:
         .     5.74GB    376:   parent, pos, _ := cc.parentConjuncts.assignConjunct(ctx, key, Conjunct{
         .          .    377:           CloseInfo: CloseInfo{
         .          .    378:                   FromDef:   cc.isDef,
         .          .    379:                   FromEmbed: cc.isEmbed,
         .          .    380:                   CycleInfo: c,
         .          .    381:           },
         .          .    382:           x: group,
         .          .    383:   }, mode, false, checkClosed)
         .          .    384:
    3.53GB     3.53GB    385:   arc := &closeContext{
         .          .    386:           generation:      cc.generation,
         .          .    387:           parent:          parent,
         .          .    388:           parentConjuncts: parent,
         .          .    389:           parentIndex:     pos,
         .          .    390:
         .          .    391:           src:     key.src,
         .          .    392:           arcType: mode,
         .          .    393:           group:   group,
         .          .    394:
         .          .    395:           isDef:                cc.isDef,
         .          .    396:           isEmbed:              cc.isEmbed,
         .          .    397:           needsCloseInSchedule: cc,
         .          .    398:   }
         .          .    399:
         .          .    400:   arc.parent.incDependent(ctx, PARENT, arc)
         .          .    401:
         .          .    402:   // If the parent, w.r.t. the subfield relation was already processed,
         .          .    403:   // there is no need to register the notification.
         .          .    404:   arc.incDependent(ctx, EVAL, cc) // matched in REF(decrement:nodeDone)
         .          .    405:
         .          .    406:   // A let field never depends on its parent. So it is okay to filter here.
         .          .    407:   if !arc.Label().IsLet() {
         .          .    408:           // prevent a dependency on self.
         .          .    409:           if key.src != cc.src {
         .     1.10GB    410:                   cc.addDependency(ctx, ARC, key, arc, key)
         .          .    411:           }
         .          .    412:   }
         .          .    413:
         .          .    414:   v := key.src
         .          .    415:   if checkClosed && v.Parent != nil && v.Parent.state != nil {

@mvdan mvdan added panic and removed Triage Requires triage/attention labels Jul 28, 2024
@mvdan mvdan removed the panic label Oct 9, 2024
@mvdan
Copy link
Member

mvdan commented Oct 9, 2024

Some updated numbers, as of CUE 655236e:

$ time CUE_EXPERIMENT=evalv3=0 cue export ./k8s/amour/list.cue >/dev/null

real	0m10.683s
user	0m18.254s
sys	0m0.456s
$ time CUE_EXPERIMENT=evalv3=1 cue export ./k8s/amour/list.cue >/dev/null
^C (stopped before it eats all my RAM)

real	0m26.453s
user	1m13.289s
sys	0m14.437s

We see a similar issue with a few other medium to large projects when using the new evaluator. The current work in #2854 and particularly #2853 should help significantly here. I will update this thread once more progress has been made, as we are actively merging improvements into master.

@mvdan
Copy link
Member

mvdan commented Oct 14, 2024

I have finally been able to reduce a bit of CUE which seems to OOM my machine where evalv2 definitely did not: #3509

@mvdan
Copy link
Member

mvdan commented Oct 14, 2024

#3509 is fixed but that doesn't fix your issue. Quickly trying to narrow down its cause, I ran into a related regression, #3511.

@mvdan mvdan added the evalv3 issues affecting only the evaluator version 3 label Dec 10, 2024
@mvdan
Copy link
Member

mvdan commented Dec 12, 2024

Still seems to happen as of 0005c22.

@mvdan
Copy link
Member

mvdan commented Dec 24, 2024

I again reduced part of this project into another performance issue to be fixed: #3633

@myitcv myitcv added tmp/eval and removed tmp/eval labels Feb 15, 2025
@mvdan mvdan added the unity-win bugs found thanks to projects added to Unity label Feb 26, 2025
@mvdan
Copy link
Member

mvdan commented Feb 26, 2025

Here is another reduction, similar to #3633, yet not the same:

import "list"

items: [...#JSONSchemaProps]

#JSONSchemaProps: {
    props?: [string]: #JSONSchemaProps

    repeat0?: [...#JSONSchemaProps]
    repeat1?: [...#JSONSchemaProps]
    repeat2?: [...#JSONSchemaProps]
    repeat3?: [...#JSONSchemaProps]
    repeat4?: [...#JSONSchemaProps]
    repeat5?: [...#JSONSchemaProps]
    repeat6?: [...#JSONSchemaProps]
    repeat7?: [...#JSONSchemaProps]
    repeat8?: [...#JSONSchemaProps]
    repeat9?: [...#JSONSchemaProps]
}

items: list.Repeat([{
    props: a1: props: a2: props: a3: props: a4: props: a5: {}
    props: b1: props: b2: props: b3: props: b4: props: b5: {}
    props: c1: props: c2: props: c3: props: c4: props: c5: {}
    props: d1: props: d2: props: d3: props: d4: props: d5: {}
    props: e1: props: e2: props: e3: props: e4: props: e5: {}
    props: f1: props: f2: props: f3: props: f4: props: f5: {}
    props: g1: props: g2: props: g3: props: g4: props: g5: {}
    props: h1: props: h2: props: h3: props: h4: props: h5: {}
    props: i1: props: i2: props: i3: props: i4: props: i5: {}
    props: j1: props: j2: props: j3: props: j4: props: j5: {}
}], 50)

As of 0b36c4e, this takes 80ms to cue export with evalv2, and 600ms with evalv3.

@mvdan
Copy link
Member

mvdan commented Mar 18, 2025

Significant progress; I am now able to finish evaluating with evalv3, even if it's still quite a bit slower than evalv2:

$ for n in 0 1; do time CUE_EXPERIMENT=evalv3=$n cue export ./k8s/amour/list.cue | wc -l; done
97333

real	0m5.992s
user	0m11.232s
sys	0m0.224s
97333

real	1m44.507s
user	2m54.228s
sys	0m8.351s

With @mpvl we believe the remaining work here is memory management, which evalv2 did to some extent, but evalv3 does not do yet. This is also why evalv3 above takes more than 20GiB of memory to finish.

@mvdan
Copy link
Member

mvdan commented Apr 8, 2025

I ran this again, as of the latest version of the automata and cue repos, and got the following stats:

$ for n in 0 1; do time CUE_EXPERIMENT=evalv3=$n CUE_STATS_FILE=- cue export ./k8s/magiclove/list.cue | wc -l; done
{
    "CUE": {
        "EvalVersion": 2,
        "Unifications": 3532721,
        "Disjuncts": 5926895,
        "Conjuncts": 6258147,
        "Freed": 5293289,
        "Reused": 5293251,
        "Allocs": 1189,
        "Retained": 646582
    },
    "Go": {
        "AllocBytes": 3121100688,
        "AllocObjects": 39615227
    }
}
118599

real	0m6.959s
user	0m11.656s
sys	0m0.285s
{
    "CUE": {
        "EvalVersion": 3,
        "Unifications": 3203181,
        "Disjuncts": 2118830,
        "Conjuncts": 5993002,
        "Freed": 0,
        "Reused": 0,
        "Allocs": 6500877,
        "Retained": 0
    },
    "Go": {
        "AllocBytes": 115424217200,
        "AllocObjects": 705573132
    }
}
118599

real	2m55.055s
user	4m16.801s
sys	0m11.064s

@mvdan
Copy link
Member

mvdan commented Apr 8, 2025

I think this issue overlaps with #3881 too; the pprof shows me:

      flat  flat%   sum%        cum   cum%
    3530ms 25.05% 25.05%     3530ms 25.05%  cuelang.org/go/internal/core/adt.transitiveMapping
    1930ms 13.70% 38.75%     3610ms 25.62%  runtime.scanobject
    1890ms 13.41% 52.16%     6580ms 46.70%  cuelang.org/go/internal/core/adt.(*reqSets).replaceIDs
    1230ms  8.73% 60.89%     1420ms 10.08%  cuelang.org/go/internal/core/adt.(*reqSets).filterTop

So, even though we waste at least 25% on GC overhead from allocations, we waste at least 46% on the reqSets logic, which as shown in #3881 has a TODO about a naive algorithm potentially being a bottleneck.

cueckoo pushed a commit that referenced this issue Apr 11, 2025
Benchmarking the reduction at https://cuelang.org/issue/3334
via benchcmd -n 12 CueVetIssue3334 cue vet -c f.cue:

                    │     old     │                 new                 │
                    │   sec/op    │   sec/op     vs base                │
    CueVetIssue3334   175.3m ± 1%   154.9m ± 1%  -11.67% (p=0.000 n=12)

                    │     old     │                 new                 │
                    │ user-sec/op │ user-sec/op  vs base                │
    CueVetIssue3334   332.1m ± 4%   270.8m ± 5%  -18.46% (p=0.000 n=12)

                    │     old      │              new               │
                    │  sys-sec/op  │  sys-sec/op   vs base          │
    CueVetIssue3334   27.61m ± 16%   21.92m ± 30%  ~ (p=0.089 n=12)

                    │      old       │                  new                  │
                    │ peak-RSS-bytes │ peak-RSS-bytes  vs base               │
    CueVetIssue3334     182.2Mi ± 4%     177.4Mi ± 1%  -2.61% (p=0.000 n=12)

Updates #3334.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I1f066de341e8fd2e9d508304e3de1b86234f60a6
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213055
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com>
cueckoo pushed a commit that referenced this issue Apr 15, 2025
The getReplacements call was creating an entirely new slice each time,
as can be seen by the inline perf TODO. We can however avoid most of
that by reusing a local buffer, not needing to alter nodeContext
or OpContext at all just yet.

As measured by

    benchcmd -n 6 VetAutomataList cue vet -c k8s/magiclove/list.cue

on a simplified version of https://github.com/uhthomas/automata,
we see a significant win:

                    │    old     │               new                │
                    │   sec/op   │   sec/op    vs base              │
    VetAutomataList   3.001 ± 1%   2.877 ± 2%  -4.12% (p=0.002 n=6)

                    │     old     │                new                │
                    │ user-sec/op │ user-sec/op  vs base              │
    VetAutomataList    3.979 ± 6%    3.769 ± 6%  -5.29% (p=0.026 n=6)

                    │     old      │             new              │
                    │  sys-sec/op  │ sys-sec/op   vs base         │
    VetAutomataList   99.79m ± 11%   93.96m ± 8%  ~ (p=0.132 n=6)

                    │      old       │                 new                  │
                    │ peak-RSS-bytes │ peak-RSS-bytes  vs base              │
    VetAutomataList     789.5Mi ± 4%     743.2Mi ± 6%  -5.87% (p=0.026 n=6)

Updates #3334.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Ia598d49e9ddd13c687315fc1726160c7f6af8800
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213493
Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
cueckoo pushed a commit that referenced this issue Apr 15, 2025
In looking at why https://github.com/uhthomas/automata was spending
so much time in checkTypos, I realised that we were doing a lot of work
in the initial appendRequired call, even though the main v.Arcs loop
did nothing as len(v.Arcs) == 0.

Detect that edge case early, which can be very common in some projects,
and avoid doing any further work.

As measured by

    benchcmd -n 6 VetAutomataList cue vet -c k8s/magiclove/list.cue

on a simplified version of https://github.com/uhthomas/automata,
we see a very substantial win:

                    │     old      │                new                 │
                    │    sec/op    │   sec/op     vs base               │
    VetAutomataList   2877.5m ± 2%   722.1m ± 2%  -74.91% (p=0.002 n=6)

                    │     old     │                new                 │
                    │ user-sec/op │ user-sec/op  vs base               │
    VetAutomataList    3.769 ± 6%    1.231 ± 9%  -67.34% (p=0.002 n=6)

                    │     old     │                 new                 │
                    │ sys-sec/op  │  sys-sec/op   vs base               │
    VetAutomataList   93.96m ± 8%   79.04m ± 15%  -15.88% (p=0.015 n=6)

                    │      old       │                 new                  │
                    │ peak-RSS-bytes │ peak-RSS-bytes  vs base              │
    VetAutomataList     743.2Mi ± 6%     672.5Mi ± 3%  -9.51% (p=0.002 n=6)

The full automata project also sees a similar speed-up,
going from about 180s total time to evaluate to about 30s.

Updates #3334.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I24c9a7bdac496608799fc72f6abd4c49eaf32180
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213494
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com>
@mvdan
Copy link
Member

mvdan commented Apr 15, 2025

With https://review.gerrithub.io/c/cue-lang/cue/+/1213494, we are now looking much better, even if still a bit slower:

$ time CUE_EXPERIMENT=evalv3=0 cue vet -c ./k8s/magiclove/list.cue

real	0m6.335s
user	0m10.852s
sys	0m0.247s
$ time CUE_EXPERIMENT=evalv3=1 cue vet -c ./k8s/magiclove/list.cue

real	0m27.729s
user	0m49.371s
sys	0m2.698s

Memory usage still peaked above 20GiB, but at least the wall time and CPU usage is much lower now.

cueckoo pushed a commit that referenced this issue Apr 15, 2025
nodeContext's overlays and disjunct fields were added and set in
https://cuelang.org/cl/1190800 but were never used.

Peak memory usage on a simplified version of
https://github.com/uhthomas/automata goes down a bit,
as most of the allocations come from cloning vertices and contexts,
so the size of these types is multiplied:

                    │      old       │                 new                  │
                    │ peak-RSS-bytes │ peak-RSS-bytes  vs base              │
    VetAutomataList     669.5Mi ± 3%     629.0Mi ± 3%  -6.06% (p=0.000 n=8)

Updates #3334.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I7ebf8db1f72cee7e4246ad10c4c0d9864ea3110e
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1213518
Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
cueckoo pushed a commit that referenced this issue Apr 28, 2025
walkVisitor was a wrapper around ast.Walk which kept a stack of visitors
to be saved and reused as we walked up and down a syntax tree.

Resolve also used a trick to only walk parts of some syntax nodes.
Rather than returning a visitor for a node, it would recursively call
walkVisitor only for the fields that it wanted to walk into.

However, these two compounded poorly; each recursive call to walkVisitor
caused new allocations to make a new stackVisitor to carry state.
Even though these allocations are small, they can add up into noticeable
overhead, particularly for the garbage collector.

Instead, we can rely on ast.Walk rather than walkVisitor, while keeping
the same trick to only walk parts of certain syntax nodes.
This keeps the same semantics, but avoids the extra allocations.

                    │     old      │                 new                 │
                    │     B/op     │     B/op      vs base               │
    VetAutomataList   734.9Mi ± 0%   733.3Mi ± 0%  -0.22% (p=0.000 n=10)

                    │     old     │                new                 │
                    │  allocs/op  │  allocs/op   vs base               │
    VetAutomataList   3.424M ± 0%   3.361M ± 0%  -1.85% (p=0.000 n=10)

For #3334.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Ibb84e22bf7cfdf8c6e2c6122ab21aed9934c13cc
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1214080
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Matthew Sackman <matthew@cue.works>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
cueckoo pushed a commit that referenced this issue Apr 28, 2025
slices.Clone is an efficient way to make a copy of a slice, but it does
create a new slice from scratch each time. In this case we are looping
and each copy is only used as part of each iteration,
so we can preallocate one slice big enough to make the copies
and reuse that across all iterations.

                    │     old      │                 new                 │
                    │     B/op     │     B/op      vs base               │
    VetAutomataList   733.3Mi ± 0%   663.1Mi ± 0%  -9.57% (p=0.000 n=10)

                    │     old     │                new                 │
                    │  allocs/op  │  allocs/op   vs base               │
    VetAutomataList   3.361M ± 0%   3.308M ± 0%  -1.58% (p=0.000 n=10)

For #3334.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Id12844a4c5f417d095521783bfc929985f23ade2
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1214081
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Matthew Sackman <matthew@cue.works>
cueckoo pushed a commit that referenced this issue Apr 28, 2025
apd.Decimal.Int64 on `d = hi - lo` will error if it overflows an int64.
This is pretty common with CUE bounds like int64, which expands to:

    >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807

Constructing that error is unfortunate as it allocates a few times
and stringifies the number too, which also has a cost.
Which is entirely unnecessary, as we don't use the error value at all.
If we know the integer will have more than one digit, give up early.

                    │     old      │                 new                 │
                    │     B/op     │     B/op      vs base               │
    VetAutomataList   663.1Mi ± 0%   659.2Mi ± 0%  -0.59% (p=0.000 n=10)

                    │     old     │                new                 │
                    │  allocs/op  │  allocs/op   vs base               │
    VetAutomataList   3.308M ± 0%   3.163M ± 0%  -4.37% (p=0.000 n=10)

Note that this requires us to move the d.Negative check earlier,
as it should not be affected by the number of digits in d.
Introduce the helper errIncompatibleBounds to avoid duplication.

Combining this and the two previous allocation optimizations,
we can see a noticeable drop in the wall time:

                    │     old     │                new                 │
                    │   sec/op    │   sec/op     vs base               │
    VetAutomataList   642.4m ± 3%   618.7m ± 1%  -3.69% (p=0.000 n=10)

                    │     old      │                 new                  │
                    │     B/op     │     B/op      vs base                │
    VetAutomataList   734.9Mi ± 0%   659.3Mi ± 0%  -10.29% (p=0.000 n=10)

                    │     old     │                new                 │
                    │  allocs/op  │  allocs/op   vs base               │
    VetAutomataList   3.424M ± 0%   3.163M ± 0%  -7.63% (p=0.000 n=10)

For #3334.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Ic0ba6031b0ace66bc1418ce76e745650021d22a0
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1214082
Reviewed-by: Matthew Sackman <matthew@cue.works>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluator evalv3 issues affecting only the evaluator version 3 NeedsInvestigation unity-win bugs found thanks to projects added to Unity
Projects
None yet
Development

No branches or pull requests

3 participants