From 93398f1f80a50d2eac316c2d4e46ceef9b9b7c0b Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Mon, 23 Dec 2024 21:57:11 +0000 Subject: [PATCH] build based on 506fe58 --- dev/.documenter-siteinfo.json | 2 +- dev/index.html | 2 +- dev/internals/index.html | 4 ++-- dev/linuxtips/index.html | 2 +- dev/manual/index.html | 2 +- dev/objects.inv | Bin 1435 -> 1517 bytes dev/reference/index.html | 6 +++--- dev/search_index.js | 2 +- 8 files changed, 10 insertions(+), 10 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index b12fd120..9839b880 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.2","generation_timestamp":"2024-12-23T15:16:25","documenter_version":"1.8.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.2","generation_timestamp":"2024-12-23T21:57:06","documenter_version":"1.8.0"}} \ No newline at end of file diff --git a/dev/index.html b/dev/index.html index c33aa8f4..b448140a 100644 --- a/dev/index.html +++ b/dev/index.html @@ -32,4 +32,4 @@ 0.024 ns (0 allocations: 0 bytes) 3

As a rule of thumb, if a benchmark reports that it took less than a nanosecond to perform, this hoisting probably occurred. You can avoid this by referencing and dereferencing the interpolated variables

julia> @btime $(Ref(a))[] + $(Ref(b))[]
   1.277 ns (0 allocations: 0 bytes)
-3

As described in the Manual, the BenchmarkTools package supports many other features, both for additional output and for more fine-grained control over the benchmarking process.

+3

As described in the Manual, the BenchmarkTools package supports many other features, both for additional output and for more fine-grained control over the benchmarking process.

diff --git a/dev/internals/index.html b/dev/internals/index.html index fcb40d48..9a8f08ef 100644 --- a/dev/internals/index.html +++ b/dev/internals/index.html @@ -1,8 +1,8 @@ -Internals · BenchmarkTools.jl

Internals

Base.isemptyMethod
isempty(group::BenchmarkGroup)

Return true if group is empty. This will first run clear_empty! on group to recursively remove any empty subgroups.

source
BenchmarkTools._withprogressMethod
_withprogress(
+Internals · BenchmarkTools.jl

Internals

Base.isemptyMethod
isempty(group::BenchmarkGroup)

Return true if group is empty. This will first run clear_empty! on group to recursively remove any empty subgroups.

source
BenchmarkTools._withprogressMethod
_withprogress(
     name::AbstractString,
     group::BenchmarkGroup;
     kwargs...,
 ) do progressid, nleaves, ndone
     ...
-end

Execute do block with following arguments:

  • progressid: logging ID to be used for @logmsg.
  • nleaves: total number of benchmarks counted at the root benchmark group.
  • ndone: number of completed benchmarks

They are either extracted from kwargs (for sub-groups) or newly created (for root benchmark group).

source
BenchmarkTools.loadMethod
BenchmarkTools.load(filename)

Load serialized benchmarking objects (e.g. results or parameters) from a JSON file.

source
BenchmarkTools.quasiquote!Method
quasiquote!(expr::Expr, vars::Vector{Symbol}, vals::Vector{Expr})

Replace every interpolated value in expr with a placeholder variable and store the resulting variable / value pairings in vars and vals.

source
BenchmarkTools.saveMethod
BenchmarkTools.save(filename, args...)

Save serialized benchmarking objects (e.g. results or parameters) to a JSON file.

source
+end

Execute do block with following arguments:

  • progressid: logging ID to be used for @logmsg.
  • nleaves: total number of benchmarks counted at the root benchmark group.
  • ndone: number of completed benchmarks

They are either extracted from kwargs (for sub-groups) or newly created (for root benchmark group).

source
BenchmarkTools.loadMethod
BenchmarkTools.load(filename)

Load serialized benchmarking objects (e.g. results or parameters) from a JSON file.

source
BenchmarkTools.quasiquote!Method
quasiquote!(expr::Expr, vars::Vector{Symbol}, vals::Vector{Expr})

Replace every interpolated value in expr with a placeholder variable and store the resulting variable / value pairings in vars and vals.

source
BenchmarkTools.saveMethod
BenchmarkTools.save(filename, args...)

Save serialized benchmarking objects (e.g. results or parameters) to a JSON file.

source
diff --git a/dev/linuxtips/index.html b/dev/linuxtips/index.html index 83670eae..8a048dde 100644 --- a/dev/linuxtips/index.html +++ b/dev/linuxtips/index.html @@ -75,4 +75,4 @@ MCE: 0 0 Machine check exceptions MCP: 61112 61112 Machine check polls ERR: 0 -MIS: 0

Some interrupts, like non-maskable interrupts (NMI), can't be redirected, but you can change the SMP affinities of the rest by writing processor indices to /proc/irq/n/smp_affinity_list, where n is the IRQ number. Here's an example that sets IRQ 22's SMP affinity to processors 0, 1, and 2:

➜ echo 0-2 | sudo tee /proc/irq/22/smp_affinity_list

The optimal way to configure SMP affinities depends a lot on your benchmarks and benchmarking process. For example, if you're running a lot of network-bound benchmarks, it can sometimes be more beneficial to evenly balance ethernet driver interrupts (usually named something like eth0-*) than to restrict them to specific processors.

A smoke test for determining the impact of IRQs on benchmark results is to see what happens when you turn on/off an IRQ load balancer like irqbalance. If this has a noticeable effect on your results, it might be worth playing around with SMP affinities to figure out which IRQs should be directed away from your shielded processors.

Performance monitoring interrupts (PMIs) and perf

Performance monitoring interrupts (PMIs) are sent by the kernel's perf subsystem, which is used to set and manage hardware performance counters monitored by other parts of the kernel. Unless perf is a dependency of your benchmarking process, it may be useful to lower perf's sample rate so that PMIs don't interfere with your experiments. One way to do this is to set the kernel.perf_cpu_time_max_percent parameter to 1:

➜ sudo sysctl kernel.perf_cpu_time_max_percent=1

This tells the kernel to inform perf that it should lower its sample rate such that sampling consumes less than 1% of CPU time. After changing this parameter, you may see messages in the system log like:

[ 3835.065463] perf samples too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate

These messages are nothing to be concerned about - it's simply the kernel reporting that it's lowering perf's max sample rate in order to respect the perf_cpu_time_max_percent property we just set.

Additional resources

+MIS: 0

Some interrupts, like non-maskable interrupts (NMI), can't be redirected, but you can change the SMP affinities of the rest by writing processor indices to /proc/irq/n/smp_affinity_list, where n is the IRQ number. Here's an example that sets IRQ 22's SMP affinity to processors 0, 1, and 2:

➜ echo 0-2 | sudo tee /proc/irq/22/smp_affinity_list

The optimal way to configure SMP affinities depends a lot on your benchmarks and benchmarking process. For example, if you're running a lot of network-bound benchmarks, it can sometimes be more beneficial to evenly balance ethernet driver interrupts (usually named something like eth0-*) than to restrict them to specific processors.

A smoke test for determining the impact of IRQs on benchmark results is to see what happens when you turn on/off an IRQ load balancer like irqbalance. If this has a noticeable effect on your results, it might be worth playing around with SMP affinities to figure out which IRQs should be directed away from your shielded processors.

Performance monitoring interrupts (PMIs) and perf

Performance monitoring interrupts (PMIs) are sent by the kernel's perf subsystem, which is used to set and manage hardware performance counters monitored by other parts of the kernel. Unless perf is a dependency of your benchmarking process, it may be useful to lower perf's sample rate so that PMIs don't interfere with your experiments. One way to do this is to set the kernel.perf_cpu_time_max_percent parameter to 1:

➜ sudo sysctl kernel.perf_cpu_time_max_percent=1

This tells the kernel to inform perf that it should lower its sample rate such that sampling consumes less than 1% of CPU time. After changing this parameter, you may see messages in the system log like:

[ 3835.065463] perf samples too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate

These messages are nothing to be concerned about - it's simply the kernel reporting that it's lowering perf's max sample rate in order to respect the perf_cpu_time_max_percent property we just set.

Additional resources

diff --git a/dev/manual/index.html b/dev/manual/index.html index 002b4fc1..4eb4999e 100644 --- a/dev/manual/index.html +++ b/dev/manual/index.html @@ -603,4 +603,4 @@ plot(t)

This will show the timing results of the trial as a violin plot. You can use all the keyword arguments from Plots.jl, for instance st=:box or yaxis=:log10.

If a BenchmarkGroup contains (only) Trials, its results can be visualized simply by

using BenchmarkPlots, StatsPlots
 t = run(g)
-plot(t)

This will display each Trial as a violin plot.

Miscellaneous tips and info

+plot(t)

This will display each Trial as a violin plot.

Miscellaneous tips and info

diff --git a/dev/objects.inv b/dev/objects.inv index 30028547e0d0e851b6067ce2730d10b272419715..e17e3984ac1f36adaceddf801081c9f4497f02d7 100644 GIT binary patch delta 1401 zcmV-<1%~>Y3+)S#g@0IEZ`(E$e%G%!#b%%d%BIZ*bnDBKrrVOD0lYX(9}8NdZ8kKi zkW>=4$ba7K6rT~7mT5^vLq&Ob z0B6N#RAPdtpmQdgw7>icB>KP>TynW2b39wEslFyAtC0r6pO2offWoaL8P63VrJOwxkbP-T!QdIU{ zDkWN$df#|w>VGV?M6;WW8EXiY37MTw30*iZ*i92v#TxM?8VWQZ8W0TyI%r6hpw_+n zmI{=^=UP<@2fI%-GlnbK>&*rY1R9qWsurXgO>~?j+~OBH7|&Fux3Z-1l1dy(@#w#d z$Jb67lcNO2ueo!uF;` z4OB~n92A|RdTm#l(ZmT1X&)Cua>Zh+^wXjSfn=P@BTJV{Y^I$WM=?x(f1PuTGA~!? z3dj``F@GtI84+}&GHeo9-fGRiQU>b=bB(um?43+MlZ9`NGTsmq zW3KvMkdV1j#tvsCf}sRM4)z_k07suOq}o_KbAPHaf@d>F$gwfmKLNG-Z0BG$Jr<*9xbY(s_}EX=lju>G_#?;Z>X$1DaV1(X!Q z*l=EL2bXStxp2AWRrvpY%#tN@<a;X3XNea8_7Gb)xPhNmtm-dPzLM+fe`@0HVvRtV*59@I7+r2~f|PLw9Yo zufDmklglg>>__XW%9028Cf=dHS1N(VuY-QF$=mn6bd$FqaJ~!}a)sg5%O6e4AAcTa zMc}HN)69$W9OhHy8`&P$Mwc0V;=!s~+9u1*Q(&&6%{c=ESbIbH2 zn_CUqS!O$O%Vi)pK3hj*eZ2;^kbOzsw8>_YJ*~hT!_LauZxQ&pHM4tDcxuhe@pAv6 zRR?v)r9WA^V=z=s)cv&HcdTw>2Q)mEy`R~6=rLaHV|2rEujYO<$Ng!+_xD^AMxy@# HoBn}7X|tq8 delta 1319 zcmV+?1=#xS3!4j&g@0GuZW}icedkw9(F#a_FizY8Zu3$*cH20J0o8Jx$AH?UB-RwU zCb_<($iH_;?p@l|%I=eC?wm9C8A_;>KY(@VyHqDY-vWHlYfjcotgMkiK3Ftxop$^_!Cfa#uq{fwWM>rTdkS7p*9z;iz+-tNC|)n01-gmR}52T*qWmy zy)2b1)UxD@HaFgsd*N2p8Nnn?nPXHa%4mi7%ji7o<1Hw_Q$U~qPk|u<25pMk-YczT z$I=j+5KNP2j(-|{n{(?dWh$k!iz#Ib9|iZ=M%B4Se1(P*4TuIrLy3+WGA)_&e%~>P za_Fwpjdbvb%y4UkR=v?|(Lki}RiPV6YS56AG!+g{7+}29x!I_amM671RO00CTqHMi z=9qCCp$p$2d_xzpnY?$wt!^G^Zv8i=v|DjAGRY-nbbqm%R|u3)Px+BLcpm`8yLNhUGP`G4CKP67(3)#ukw|=NcPeHhm#()g8lXlY z6d>{e)f=}moTXl1Nc*@P(rcbLZJw7kh$Q1w9y_*N;xOH+aTLSkmp28+DDw)1zJYu* zQJb-ZQ-8@Gb&f*<+gq>2H|A@LzBrkEx_sFi))5So!(TnepQ-K0$8$`aWM6#mz7!hmcf)hHIDLuz=$ba8XIRu&f z{3nF(P<@B^9nQg1l2ewfe2jKK2@A>QgMTm>U;F+?)X~3V!+Wptx)iS;dZ7rI70K>* zX~=#MB{S}DUwA7VXC~G6ie@XEX1yc<5bP)c5CP;Wue#FYDxOC^dLlHi7SLTU+SgzG zD;TS488*#To=N`iiF$kmCv)ozOWe%)-g(ub`40cccbI{dPNB6qcGFLe-lKxa;eYAm z@J(AuylX3(9G&4V>aZTqnUp%gyrq#dSi=&dq4ndX1H(nW0o=mPCClvo%x%~P?`@8i za+_<%e`tpJ8Y|f>{rUUYnAS{_!?&9;jf}<3(b?7jT6V`vD>`|8fVo>XNiZMNmI53;b*AiPN9~-CCDBlHpS(Y)J2jGHDj= z2C}az+BUgd@#j^fXV_Va-9Y22+ -Reference · BenchmarkTools.jl

References

BenchmarkTools.clear_empty!Method
clear_empty!(group::BenchmarkGroup)

Recursively remove any empty subgroups from group.

Use this to prune a BenchmarkGroup after accessing the incorrect fields, such as g=BenchmarkGroup(); g[1], without storing anything to g[1], which will create an empty subgroup g[1].

source
BenchmarkTools.tune!Function
tune!(b::Benchmark, p::Parameters = b.params; verbose::Bool = false, pad = "", kwargs...)

Tune a Benchmark instance.

If the number of evals in the parameters p has been set manually, this function does nothing.

source
BenchmarkTools.tune!Method
tune!(group::BenchmarkGroup; verbose::Bool = false, pad = "", kwargs...)

Tune a BenchmarkGroup instance. For most benchmarks, tune! needs to perform many evaluations to determine the proper parameters for any given benchmark - often more evaluations than are performed when running a trial. In fact, the majority of total benchmarking time is usually spent tuning parameters, rather than actually running trials.

source
BenchmarkTools.@ballocatedMacro
@ballocated expression [other parameters...]

Similar to the @allocated macro included with Julia, this returns the number of bytes allocated when executing a given expression. It uses the @benchmark macro, however, and accepts all of the same additional parameters as @benchmark. The returned allocations correspond to the trial with the minimum elapsed time measured during the benchmark.

source
BenchmarkTools.@ballocationsMacro
@ballocations expression [other parameters...]

Similar to the @allocations macro included with Julia, this macro evaluates an expression, discarding the resulting value, and returns the total number of allocations made during its execution.

Unlike @allocations, it uses the @benchmark macro from the BenchmarkTools package, and accepts all of the same additional parameters as @benchmark. The returned number of allocations corresponds to the trial with the minimum elapsed time measured during the benchmark.

source
BenchmarkTools.@belapsedMacro
@belapsed expression [other parameters...]

Similar to the @elapsed macro included with Julia, this returns the elapsed time (in seconds) to execute a given expression. It uses the @benchmark macro, however, and accepts all of the same additional parameters as @benchmark. The returned time is the minimum elapsed time measured during the benchmark.

source
BenchmarkTools.@benchmarkMacro
@benchmark <expr to benchmark> [setup=<setup expr>]

Run benchmark on a given expression.

Example

The simplest usage of this macro is to put it in front of what you want to benchmark.

julia> @benchmark sin(1)
+Reference · BenchmarkTools.jl

References

BenchmarkTools.clear_empty!Method
clear_empty!(group::BenchmarkGroup)

Recursively remove any empty subgroups from group.

Use this to prune a BenchmarkGroup after accessing the incorrect fields, such as g=BenchmarkGroup(); g[1], without storing anything to g[1], which will create an empty subgroup g[1].

source
BenchmarkTools.judgeMethod
judge(target::TrialEstimate, baseline::TrialEstimate; [time_tolerance::Float64=0.05])

Report on whether the first estimate target represents a regression or an improvement with respect to the second estimate baseline.

source
BenchmarkTools.ratioMethod
ratio(target::TrialEstimate, baseline::TrialEstimate)

Returns a ratio of the target estimate to the baseline estimate, as e.g. time(target)/time(baseline).

source
BenchmarkTools.tune!Function
tune!(b::Benchmark, p::Parameters = b.params; verbose::Bool = false, pad = "", kwargs...)

Tune a Benchmark instance.

If the number of evals in the parameters p has been set manually, this function does nothing.

source
BenchmarkTools.tune!Method
tune!(group::BenchmarkGroup; verbose::Bool = false, pad = "", kwargs...)

Tune a BenchmarkGroup instance. For most benchmarks, tune! needs to perform many evaluations to determine the proper parameters for any given benchmark - often more evaluations than are performed when running a trial. In fact, the majority of total benchmarking time is usually spent tuning parameters, rather than actually running trials.

source
BenchmarkTools.@ballocatedMacro
@ballocated expression [other parameters...]

Similar to the @allocated macro included with Julia, this returns the number of bytes allocated when executing a given expression. It uses the @benchmark macro, however, and accepts all of the same additional parameters as @benchmark. The returned allocations correspond to the trial with the minimum elapsed time measured during the benchmark.

source
BenchmarkTools.@ballocationsMacro
@ballocations expression [other parameters...]

Similar to the @allocations macro included with Julia, this macro evaluates an expression, discarding the resulting value, and returns the total number of allocations made during its execution.

Unlike @allocations, it uses the @benchmark macro from the BenchmarkTools package, and accepts all of the same additional parameters as @benchmark. The returned number of allocations corresponds to the trial with the minimum elapsed time measured during the benchmark.

source
BenchmarkTools.@belapsedMacro
@belapsed expression [other parameters...]

Similar to the @elapsed macro included with Julia, this returns the elapsed time (in seconds) to execute a given expression. It uses the @benchmark macro, however, and accepts all of the same additional parameters as @benchmark. The returned time is the minimum elapsed time measured during the benchmark.

source
BenchmarkTools.@benchmarkMacro
@benchmark <expr to benchmark> [setup=<setup expr>]

Run benchmark on a given expression.

Example

The simplest usage of this macro is to put it in front of what you want to benchmark.

julia> @benchmark sin(1)
 BenchmarkTools.Trial:
   memory estimate:  0 bytes
   allocs estimate:  0
@@ -37,6 +37,6 @@
   maximum time:     276.033 ns (0.00% GC)
   --------------
   samples:          10000
-  evals/sample:     935
source
BenchmarkTools.@benchmarkableMacro
@benchmarkable <expr to benchmark> [setup=<setup expr>]

Create a Benchmark instance for the given expression. @benchmarkable has similar syntax with @benchmark. See also @benchmark.

source
BenchmarkTools.@benchmarksetMacro
@benchmarkset "title" begin ... end

Create a benchmark set, or multiple benchmark sets if a for loop is provided.

Examples

@benchmarkset "suite" for k in 1:5
+  evals/sample:     935
source
BenchmarkTools.@benchmarkableMacro
@benchmarkable <expr to benchmark> [setup=<setup expr>]

Create a Benchmark instance for the given expression. @benchmarkable has similar syntax with @benchmark. See also @benchmark.

source
BenchmarkTools.@benchmarksetMacro
@benchmarkset "title" begin ... end

Create a benchmark set, or multiple benchmark sets if a for loop is provided.

Examples

@benchmarkset "suite" for k in 1:5
     @case "case $k" rand($k, $k)
-end
source
BenchmarkTools.@bprofileMacro
@bprofile expression [other parameters...]

Run @benchmark while profiling. This is similar to

@profile @benchmark expression [other parameters...]

but the profiling is applied only to the main execution (after compilation and tuning). The profile buffer is cleared prior to execution.

View the profile results with Profile.print(...). See the profiling section of the Julia manual for more information.

source
BenchmarkTools.@btimeMacro
@btime expression [other parameters...]

Similar to the @time macro included with Julia, this executes an expression, printing the time it took to execute and the memory allocated before returning the value of the expression.

Unlike @time, it uses the @benchmark macro, and accepts all of the same additional parameters as @benchmark. The printed time is the minimum elapsed time measured during the benchmark.

source
BenchmarkTools.@btimedMacro
@btimed expression [other parameters...]

Similar to the @timed macro included with Julia, this macro executes an expression and returns a NamedTuple containing the value of the expression, the minimum elapsed time in seconds, the total bytes allocated, the number of allocations, and the garbage collection time in seconds during the benchmark.

Unlike @timed, it uses the @benchmark macro from the BenchmarkTools package for more detailed and consistent performance measurements. The elapsed time reported is the minimum time measured during the benchmark. It accepts all additional parameters supported by @benchmark.

source
Base.runFunction
run(b::Benchmark[, p::Parameters = b.params]; kwargs...)

Run the benchmark defined by @benchmarkable.

source
run(group::BenchmarkGroup[, args...]; verbose::Bool = false, pad = "", kwargs...)

Run the benchmark group, with benchmark parameters set to group's by default.

source
BenchmarkTools.saveFunction
BenchmarkTools.save(filename, args...)

Save serialized benchmarking objects (e.g. results or parameters) to a JSON file.

source
BenchmarkTools.loadFunction
BenchmarkTools.load(filename)

Load serialized benchmarking objects (e.g. results or parameters) from a JSON file.

source
+end
source
BenchmarkTools.@bprofileMacro
@bprofile expression [other parameters...]

Run @benchmark while profiling. This is similar to

@profile @benchmark expression [other parameters...]

but the profiling is applied only to the main execution (after compilation and tuning). The profile buffer is cleared prior to execution.

View the profile results with Profile.print(...). See the profiling section of the Julia manual for more information.

source
BenchmarkTools.@btimeMacro
@btime expression [other parameters...]

Similar to the @time macro included with Julia, this executes an expression, printing the time it took to execute and the memory allocated before returning the value of the expression.

Unlike @time, it uses the @benchmark macro, and accepts all of the same additional parameters as @benchmark. The printed time is the minimum elapsed time measured during the benchmark.

source
BenchmarkTools.@btimedMacro
@btimed expression [other parameters...]

Similar to the @timed macro included with Julia, this macro executes an expression and returns a NamedTuple containing the value of the expression, the minimum elapsed time in seconds, the total bytes allocated, the number of allocations, and the garbage collection time in seconds during the benchmark.

Unlike @timed, it uses the @benchmark macro from the BenchmarkTools package for more detailed and consistent performance measurements. The elapsed time reported is the minimum time measured during the benchmark. It accepts all additional parameters supported by @benchmark.

source
Base.runFunction
run(b::Benchmark[, p::Parameters = b.params]; kwargs...)

Run the benchmark defined by @benchmarkable.

source
run(group::BenchmarkGroup[, args...]; verbose::Bool = false, pad = "", kwargs...)

Run the benchmark group, with benchmark parameters set to group's by default.

source
BenchmarkTools.saveFunction
BenchmarkTools.save(filename, args...)

Save serialized benchmarking objects (e.g. results or parameters) to a JSON file.

source
BenchmarkTools.loadFunction
BenchmarkTools.load(filename)

Load serialized benchmarking objects (e.g. results or parameters) from a JSON file.

source
diff --git a/dev/search_index.js b/dev/search_index.js index e6d9759a..2db9af53 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"manual/#Manual","page":"Manual","title":"Manual","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools was created to facilitate the following tasks:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Organize collections of benchmarks into manageable benchmark suites\nConfigure, save, and reload benchmark parameters for convenience, accuracy, and consistency\nExecute benchmarks in a manner that yields reasonable and consistent performance predictions\nAnalyze and compare results to determine whether a code change caused regressions or improvements","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Before we get too far, let's define some of the terminology used in this document:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"\"evaluation\": a single execution of a benchmark expression.\n\"sample\": a single time/memory measurement obtained by running multiple evaluations.\n\"trial\": an experiment in which multiple samples are gathered (or the result of such an experiment).\n\"benchmark parameters\": the configuration settings that determine how a benchmark trial is performed","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"The reasoning behind our definition of \"sample\" may not be obvious to all readers. If the time to execute a benchmark is smaller than the resolution of your timing method, then a single evaluation of the benchmark will generally not produce a valid sample. In that case, one must approximate a valid sample by recording the total time t it takes to record n evaluations, and estimating the sample's time per evaluation as t/n. For example, if a sample takes 1 second for 1 million evaluations, the approximate time per evaluation for that sample is 1 microsecond. It's not obvious what the right number of evaluations per sample should be for any given benchmark, so BenchmarkTools provides a mechanism (the tune! method) to automatically figure it out for you.","category":"page"},{"location":"manual/#Benchmarking-basics","page":"Manual","title":"Benchmarking basics","text":"","category":"section"},{"location":"manual/#Defining-and-executing-benchmarks","page":"Manual","title":"Defining and executing benchmarks","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"To quickly benchmark a Julia expression, use @benchmark:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @benchmark sin(1)\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 1.442 ns … 53.028 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 1.453 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 1.462 ns ± 0.566 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █ \n ▂▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃\n 1.44 ns Histogram: frequency by time 1.46 ns (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"The @benchmark macro is essentially shorthand for defining a benchmark, auto-tuning the benchmark's configuration parameters, and running the benchmark. These three steps can be done explicitly using @benchmarkable, tune! and run:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> b = @benchmarkable sin(1); # define the benchmark with default parameters\n\n# find the right evals/sample and number of samples to take for this benchmark\njulia> tune!(b);\n\njulia> run(b)\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 1.442 ns … 4.308 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 1.453 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 1.456 ns ± 0.056 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █ \n ▂▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▃\n 1.44 ns Histogram: frequency by time 1.46 ns (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Alternatively, you can use the @btime, @btimed, @belapsed, @ballocated, or @ballocations macros. These take exactly the same arguments as @benchmark, but behave like the @time, @timed, @elapsed, @allocated, or @allocations macros included with Julia.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @btime sin(1)\n 13.612 ns (0 allocations: 0 bytes)\n0.8414709848078965\n\njulia> @belapsed sin(1)\n1.3614228456913828e-8\n\njulia> @btimed sin(1)\n(value = 0.8414709848078965, time = 9.16e-10, bytes = 0, alloc = 0, gctime = 0.0)\n\njulia> @ballocated rand(4, 4)\n208\n\njulia> @ballocations rand(4, 4)\n2","category":"page"},{"location":"manual/#Benchmark-Parameters","page":"Manual","title":"Benchmark Parameters","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"You can pass the following keyword arguments to @benchmark, @benchmarkable, and run to configure the execution process:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"samples: The number of samples to take. Execution will end if this many samples have been collected. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.samples = 10000.\nseconds: The number of seconds budgeted for the benchmarking process. The trial will terminate if this time is exceeded (regardless of samples), but at least one sample will always be taken. In practice, actual runtime can overshoot the budget by the duration of a sample. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.seconds = 5.\nevals: The number of evaluations per sample. For best results, this should be kept consistent between trials. A good guess for this value can be automatically set on a benchmark via tune!, but using tune! can be less consistent than setting evals manually (which bypasses tuning). Defaults to BenchmarkTools.DEFAULT_PARAMETERS.evals = 1. If the function you study mutates its input, it is probably a good idea to set evals=1 manually.\noverhead: The estimated loop overhead per evaluation in nanoseconds, which is automatically subtracted from every sample time measurement. The default value is BenchmarkTools.DEFAULT_PARAMETERS.overhead = 0. BenchmarkTools.estimate_overhead can be called to determine this value empirically (which can then be set as the default value, if you want).\ngctrial: If true, run gc() before executing this benchmark's trial. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.gctrial = true.\ngcsample: If true, run gc() before each sample. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.gcsample = false.\ntime_tolerance: The noise tolerance for the benchmark's time estimate, as a percentage. This is utilized after benchmark execution, when analyzing results. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.time_tolerance = 0.05.\nmemory_tolerance: The noise tolerance for the benchmark's memory estimate, as a percentage. This is utilized after benchmark execution, when analyzing results. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.memory_tolerance = 0.01.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"To change the default values of the above fields, one can mutate the fields of BenchmarkTools.DEFAULT_PARAMETERS, for example:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# change default for `seconds` to 2.5\nBenchmarkTools.DEFAULT_PARAMETERS.seconds = 2.50\n# change default for `time_tolerance` to 0.20\nBenchmarkTools.DEFAULT_PARAMETERS.time_tolerance = 0.20","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Here's an example that demonstrates how to pass these parameters to benchmark definitions:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"b = @benchmarkable sin(1) seconds=1 time_tolerance=0.01\nrun(b) # equivalent to run(b, seconds = 1, time_tolerance = 0.01)","category":"page"},{"location":"manual/#Interpolating-values-into-benchmark-expressions","page":"Manual","title":"Interpolating values into benchmark expressions","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"You can interpolate values into @benchmark and @benchmarkable expressions:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# rand(1000) is executed for each evaluation\njulia> @benchmark sum(rand(1000))\nBenchmarkTools.Trial: 10000 samples with 10 evaluations.\n Range (min … max): 1.153 μs … 142.253 μs ┊ GC (min … max): 0.00% … 96.43%\n Time (median): 1.363 μs ┊ GC (median): 0.00%\n Time (mean ± σ): 1.786 μs ± 4.612 μs ┊ GC (mean ± σ): 9.58% ± 3.70%\n\n ▄▆██▇▇▆▄▃▂▁ ▁▁▂▂▂▂▂▂▂▁▂▁ \n ████████████████▆▆▇▅▆▇▆▆▆▇▆▇▆▆▅▄▄▄▅▃▄▇██████████████▇▇▇▇▆▆▇▆▆▅▅▅▅\n 1.15 μs Histogram: log(frequency) by time 3.8 μs (top 1%)\n\n Memory estimate: 7.94 KiB, allocs estimate: 1.\n\n# rand(1000) is evaluated at definition time, and the resulting\n# value is interpolated into the benchmark expression\njulia> @benchmark sum($(rand(1000)))\nBenchmarkTools.Trial: 10000 samples with 963 evaluations.\n Range (min … max): 84.477 ns … 241.602 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 84.497 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 85.125 ns ± 5.262 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █ \n █▅▇▅▄███▇▇▆▆▆▄▄▅▅▄▄▅▄▄▅▄▄▄▄▁▃▄▁▁▃▃▃▄▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▃▃▁▁▁▃▁▁▁▁▆\n 84.5 ns Histogram: log(frequency) by time 109 ns (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"A good rule of thumb is that external variables should be explicitly interpolated into the benchmark expression:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> A = rand(1000);\n\n# BAD: A is a global variable in the benchmarking context\njulia> @benchmark [i*i for i in A]\nBenchmarkTools.Trial: 10000 samples with 54 evaluations.\n Range (min … max): 889.241 ns … 29.584 μs ┊ GC (min … max): 0.00% … 93.33%\n Time (median): 1.073 μs ┊ GC (median): 0.00%\n Time (mean ± σ): 1.296 μs ± 2.004 μs ┊ GC (mean ± σ): 14.31% ± 8.76%\n\n ▃█▆ \n ▂▂▄▆███▇▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▂▂▁▁▁▁▁▂▁▁▁▁▂▂▁▁▁▁▂▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂\n 889 ns Histogram: frequency by time 2.92 μs (top 1%)\n\n Memory estimate: 7.95 KiB, allocs estimate: 2.\n\n# GOOD: A is a constant value in the benchmarking context\njulia> @benchmark [i*i for i in $A]\nBenchmarkTools.Trial: 10000 samples with 121 evaluations.\n Range (min … max): 742.455 ns … 11.846 μs ┊ GC (min … max): 0.00% … 88.05%\n Time (median): 909.959 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 1.135 μs ± 1.366 μs ┊ GC (mean ± σ): 16.94% ± 12.58%\n\n ▇█▅▂ ▁\n ████▇▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅▆██\n 742 ns Histogram: log(frequency) by time 10.3 μs (top 1%)\n\n Memory estimate: 7.94 KiB, allocs estimate: 1.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"(Note that \"KiB\" is the SI prefix for a kibibyte: 1024 bytes.)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Keep in mind that you can mutate external state from within a benchmark:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> A = zeros(3);\n\n # each evaluation will modify A\njulia> b = @benchmarkable fill!($A, rand());\n\njulia> run(b, samples = 1);\n\njulia> A\n3-element Vector{Float64}:\n 0.4615582142515109\n 0.4615582142515109\n 0.4615582142515109\n\njulia> run(b, samples = 1);\n\njulia> A\n3-element Vector{Float64}:\n 0.06373849439691504\n 0.06373849439691504\n 0.06373849439691504","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Normally, you can't use locally scoped variables in @benchmark or @benchmarkable, since all benchmarks are defined at the top-level scope by design. However, you can work around this by interpolating local variables into the benchmark expression:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# will throw UndefVar error for `x`\njulia> let x = 1\n @benchmark sin(x)\n end\n\n# will work fine\njulia> let x = 1\n @benchmark sin($x)\n end","category":"page"},{"location":"manual/#Setup-and-teardown-phases","page":"Manual","title":"Setup and teardown phases","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools allows you to pass setup and teardown expressions to @benchmark and @benchmarkable. The setup expression is evaluated just before sample execution, while the teardown expression is evaluated just after sample execution. Here's an example where this kind of thing is useful:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> x = rand(100000);\n\n# For each sample, bind a variable `y` to a fresh copy of `x`. As you\n# can see, `y` is accessible within the scope of the core expression.\njulia> b = @benchmarkable sort!(y) setup=(y = copy($x))\nBenchmark(evals=1, seconds=5.0, samples=10000)\n\njulia> run(b)\nBenchmarkTools.Trial: 819 samples with 1 evaluations.\n Range (min … max): 5.983 ms … 6.954 ms ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 6.019 ms ┊ GC (median): 0.00%\n Time (mean ± σ): 6.029 ms ± 46.222 μs ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n ▃▂▂▄█▄▂▃ \n ▂▃▃▄▆▅████████▇▆▆▅▄▄▄▅▆▄▃▄▅▄▃▂▃▃▃▂▂▃▁▂▂▂▁▂▂▂▂▂▂▁▁▁▁▂▂▁▁▁▂▂▁▁▂▁▁▂\n 5.98 ms Histogram: frequency by time 6.18 ms (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"In the above example, we wish to benchmark Julia's in-place sorting method. Without a setup phase, we'd have to either allocate a new input vector for each sample (such that the allocation time would pollute our results) or use the same input vector every sample (such that all samples but the first would benchmark the wrong thing - sorting an already sorted vector). The setup phase solves the problem by allowing us to do some work that can be utilized by the core expression, without that work being erroneously included in our performance results.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Note that the setup and teardown phases are executed for each sample, not each evaluation. Thus, the sorting example above wouldn't produce the intended results if evals/sample > 1 (it'd suffer from the same problem of benchmarking against an already sorted vector).","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"If your setup involves several objects, you need to separate the assignments with semicolons, as follows:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @btime x + y setup = (x=1; y=2) # works\n 1.238 ns (0 allocations: 0 bytes)\n3\n\njulia> @btime x + y setup = (x=1, y=2) # errors\nERROR: UndefVarError: `x` not defined","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"This also explains the error you get if you accidentally put a comma in the setup for a single argument:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @btime exp(x) setup = (x=1,) # errors\nERROR: UndefVarError: `x` not defined","category":"page"},{"location":"manual/#Understanding-compiler-optimizations","page":"Manual","title":"Understanding compiler optimizations","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"It's possible for LLVM and Julia's compiler to perform optimizations on @benchmarkable expressions. In some cases, these optimizations can elide a computation altogether, resulting in unexpectedly \"fast\" benchmarks. For example, the following expression is non-allocating:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @benchmark (view(a, 1:2, 1:2); 1) setup=(a = rand(3, 3))\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 2.885 ns … 14.797 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 2.895 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 3.320 ns ± 0.909 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █ ▁ ▁ ▁▁▁ ▂▃▃▁\n █▁▁▇█▇▆█▇████████████████▇█▇█▇▇▇▇█▇█▇▅▅▄▁▁▁▁▄▃▁▃▃▁▄▃▁▄▁▃▅▅██████\n 2.88 ns Histogram: log(frequency) by time 5.79 ns (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.0","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Note, however, that this does not mean that view(a, 1:2, 1:2) is non-allocating:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @benchmark view(a, 1:2, 1:2) setup=(a = rand(3, 3))\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 3.175 ns … 18.314 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 3.176 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 3.262 ns ± 0.882 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █ \n █▁▂▁▁▁▂▁▂▁▂▁▁▂▁▁▂▂▂▂▂▂▁▁▂▁▁▂▁▁▁▂▂▁▁▁▂▁▂▂▁▂▁▁▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂\n 3.18 ns Histogram: frequency by time 4.78 ns (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.8","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"The key point here is that these two benchmarks measure different things, even though their code is similar. In the first example, Julia was able to optimize away view(a, 1:2, 1:2) because it could prove that the value wasn't being returned and a wasn't being mutated. In the second example, the optimization is not performed because view(a, 1:2, 1:2) is a return value of the benchmark expression.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools will faithfully report the performance of the exact code that you provide to it, including any compiler optimizations that might happen to elide the code completely. It's up to you to design benchmarks which actually exercise the code you intend to exercise. ","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"A common place julia's optimizer may cause a benchmark to not measure what a user thought it was measuring is simple operations where all values are known at compile time. Suppose you wanted to measure the time it takes to add together two integers:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> a = 1; b = 2\n2\n\njulia> @btime $a + $b\n 0.024 ns (0 allocations: 0 bytes)\n3","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"in this case julia was able to use the properties of +(::Int, ::Int) to know that it could safely replace $a + $b with 3 at compile time. We can stop the optimizer from doing this by referencing and dereferencing the interpolated variables ","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @btime $(Ref(a))[] + $(Ref(b))[]\n 1.277 ns (0 allocations: 0 bytes)\n3","category":"page"},{"location":"manual/#Handling-benchmark-results","page":"Manual","title":"Handling benchmark results","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools provides four types related to benchmark results:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Trial: stores all samples collected during a benchmark trial, as well as the trial's parameters\nTrialEstimate: a single estimate used to summarize a Trial\nTrialRatio: a comparison between two TrialEstimate\nTrialJudgement: a classification of the fields of a TrialRatio as invariant, regression, or improvement","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"This section provides a limited number of examples demonstrating these types. For a thorough list of supported functionality, see the reference document.","category":"page"},{"location":"manual/#Trial-and-TrialEstimate","page":"Manual","title":"Trial and TrialEstimate","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"Running a benchmark produces an instance of the Trial type:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> t = @benchmark eigen(rand(10, 10))\nBenchmarkTools.Trial: 10000 samples with 1 evaluations.\n Range (min … max): 26.549 μs … 1.503 ms ┊ GC (min … max): 0.00% … 93.21%\n Time (median): 30.818 μs ┊ GC (median): 0.00%\n Time (mean ± σ): 31.777 μs ± 25.161 μs ┊ GC (mean ± σ): 1.31% ± 1.63%\n\n ▂▃▅▆█▇▇▆▆▄▄▃▁▁ \n ▁▁▁▁▁▁▂▃▄▆████████████████▆▆▅▅▄▄▃▃▃▂▂▂▂▂▂▁▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\n 26.5 μs Histogram: frequency by time 41.3 μs (top 1%)\n\n Memory estimate: 16.36 KiB, allocs estimate: 19.\n\njulia> dump(t) # here's what's actually stored in a Trial\nBenchmarkTools.Trial\n params: BenchmarkTools.Parameters\n seconds: Float64 5.0\n samples: Int64 10000\n evals: Int64 1\n overhead: Float64 0.0\n gctrial: Bool true\n gcsample: Bool false\n time_tolerance: Float64 0.05\n memory_tolerance: Float64 0.01\n times: Array{Float64}((10000,)) [26549.0, 26960.0, 27030.0, 27171.0, 27211.0, 27261.0, 27270.0, 27311.0, 27311.0, 27321.0 … 55383.0, 55934.0, 58649.0, 62847.0, 68547.0, 75761.0, 247081.0, 1.421718e6, 1.488322e6, 1.50329e6]\n gctimes: Array{Float64}((10000,)) [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.366184e6, 1.389518e6, 1.40116e6]\n memory: Int64 16752\n allocs: Int64 19","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"As you can see from the above, a couple of different timing estimates are pretty-printed with the Trial. You can calculate these estimates yourself using the minimum, maximum, median, mean, and std functions (Note that median, mean, and std are reexported in BenchmarkTools from Statistics):","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> minimum(t)\nBenchmarkTools.TrialEstimate: \n time: 26.549 μs\n gctime: 0.000 ns (0.00%)\n memory: 16.36 KiB\n allocs: 19\n\njulia> maximum(t)\nBenchmarkTools.TrialEstimate: \n time: 1.503 ms\n gctime: 1.401 ms (93.21%)\n memory: 16.36 KiB\n allocs: 19\n\njulia> median(t)\nBenchmarkTools.TrialEstimate: \n time: 30.818 μs\n gctime: 0.000 ns (0.00%)\n memory: 16.36 KiB\n allocs: 19\n\njulia> mean(t)\nBenchmarkTools.TrialEstimate: \n time: 31.777 μs\n gctime: 415.686 ns (1.31%)\n memory: 16.36 KiB\n allocs: 19\n\njulia> std(t)\nBenchmarkTools.TrialEstimate: \n time: 25.161 μs\n gctime: 23.999 μs (95.38%)\n memory: 16.36 KiB\n allocs: 19","category":"page"},{"location":"manual/#Which-estimator-should-I-use?","page":"Manual","title":"Which estimator should I use?","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"Time distributions are always right-skewed for the benchmarks we've tested. This phenomena can be justified by considering that the machine noise affecting the benchmarking process is, in some sense, inherently positive - there aren't really sources of noise that would regularly cause your machine to execute a series of instructions faster than the theoretical \"ideal\" time prescribed by your hardware. Following this characterization of benchmark noise, we can describe the behavior of our estimators:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"The minimum is a robust estimator for the location parameter of the time distribution, and should not be considered an outlier\nThe median, as a robust measure of central tendency, should be relatively unaffected by outliers\nThe mean, as a non-robust measure of central tendency, will usually be positively skewed by outliers\nThe maximum should be considered a primarily noise-driven outlier, and can change drastically between benchmark trials.","category":"page"},{"location":"manual/#TrialRatio-and-TrialJudgement","page":"Manual","title":"TrialRatio and TrialJudgement","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools supplies a ratio function for comparing two values:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> ratio(3, 2)\n1.5\n\njulia> ratio(1, 0)\nInf\n\njulia> ratio(0, 1)\n0.0\n\n# a == b is special-cased to 1.0 to prevent NaNs in this case\njulia> ratio(0, 0)\n1.0","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Calling the ratio function on two TrialEstimate instances compares their fields:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> using BenchmarkTools\n\njulia> b = @benchmarkable eigen(rand(10, 10));\n\njulia> tune!(b);\n\njulia> m1 = median(run(b))\nBenchmarkTools.TrialEstimate:\n time: 38.638 μs\n gctime: 0.000 ns (0.00%)\n memory: 9.30 KiB\n allocs: 28\n\njulia> m2 = median(run(b))\nBenchmarkTools.TrialEstimate:\n time: 38.723 μs\n gctime: 0.000 ns (0.00%)\n memory: 9.30 KiB\n allocs: 28\n\njulia> ratio(m1, m2)\nBenchmarkTools.TrialRatio:\n time: 0.997792009916587\n gctime: 1.0\n memory: 1.0\n allocs: 1.0","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Use the judge function to decide if the estimate passed as first argument represents a regression versus the second estimate:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> m1 = median(@benchmark eigen(rand(10, 10)))\nBenchmarkTools.TrialEstimate:\n time: 38.745 μs\n gctime: 0.000 ns (0.00%)\n memory: 9.30 KiB\n allocs: 28\n\njulia> m2 = median(@benchmark eigen(rand(10, 10)))\nBenchmarkTools.TrialEstimate:\n time: 38.611 μs\n gctime: 0.000 ns (0.00%)\n memory: 9.30 KiB\n allocs: 28\n\n# percent change falls within noise tolerance for all fields\njulia> judge(m1, m2)\nBenchmarkTools.TrialJudgement:\n time: +0.35% => invariant (5.00% tolerance)\n memory: +0.00% => invariant (1.00% tolerance)\n\n# changing time_tolerance causes it to be marked as a regression\njulia> judge(m1, m2; time_tolerance = 0.0001)\nBenchmarkTools.TrialJudgement:\n time: +0.35% => regression (0.01% tolerance)\n memory: +0.00% => invariant (1.00% tolerance)\n\n# switch m1 & m2; from this perspective, the difference is an improvement\njulia> judge(m2, m1; time_tolerance = 0.0001)\nBenchmarkTools.TrialJudgement:\n time: -0.35% => improvement (0.01% tolerance)\n memory: +0.00% => invariant (1.00% tolerance)\n\n# you can pass in TrialRatios as well\njulia> judge(ratio(m1, m2)) == judge(m1, m2)\ntrue","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Note that changes in GC time and allocation count aren't classified by judge. This is because GC time and allocation count, while sometimes useful for answering why a regression occurred, are not generally useful for answering if a regression occurred. Instead, it's usually only differences in time and memory usage that determine whether or not a code change is an improvement or a regression. For example, in the unlikely event that a code change decreased time and memory usage, but increased GC time and allocation count, most people would consider that code change to be an improvement. The opposite is also true: an increase in time and memory usage would be considered a regression no matter how much GC time or allocation count decreased.","category":"page"},{"location":"manual/#The-BenchmarkGroup-type","page":"Manual","title":"The BenchmarkGroup type","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"In the real world, one often deals with whole suites of benchmarks rather than just individual benchmarks. The BenchmarkGroup type serves as the \"organizational unit\" of such suites, and can be used to store and structure benchmark definitions, raw Trial data, estimation results, and even other BenchmarkGroup instances.","category":"page"},{"location":"manual/#Defining-benchmark-suites","page":"Manual","title":"Defining benchmark suites","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"A BenchmarkGroup stores a Dict that maps benchmark IDs to values, as well as descriptive \"tags\" that can be used to filter the group by topic. To get started, let's demonstrate how one might use the BenchmarkGroup type to define a simple benchmark suite:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# Define a parent BenchmarkGroup to contain our suite\nsuite = BenchmarkGroup()\n\n# Add some child groups to our benchmark suite. The most relevant BenchmarkGroup constructor\n# for this case is BenchmarkGroup(tags::Vector). These tags are useful for\n# filtering benchmarks by topic, which we'll cover in a later section.\nsuite[\"utf8\"] = BenchmarkGroup([\"string\", \"unicode\"])\nsuite[\"trig\"] = BenchmarkGroup([\"math\", \"triangles\"])\n\n# Add some benchmarks to the \"utf8\" group\nteststr = join(rand('a':'d', 10^4));\nsuite[\"utf8\"][\"replace\"] = @benchmarkable replace($teststr, \"a\" => \"b\")\nsuite[\"utf8\"][\"join\"] = @benchmarkable join($teststr, $teststr)\n\n# Add some benchmarks to the \"trig\" group\nfor f in (sin, cos, tan)\n for x in (0.0, pi)\n suite[\"trig\"][string(f), x] = @benchmarkable $(f)($x)\n end\nend","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Let's look at our newly defined suite in the REPL:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> suite\n2-element BenchmarkTools.BenchmarkGroup:\n tags: []\n \"utf8\" => 2-element BenchmarkTools.BenchmarkGroup:\n\t tags: [\"string\", \"unicode\"]\n\t \"join\" => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t \"replace\" => Benchmark(evals=1, seconds=5.0, samples=10000)\n \"trig\" => 6-element BenchmarkTools.BenchmarkGroup:\n\t tags: [\"math\", \"triangles\"]\n\t (\"cos\", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t (\"sin\", π = 3.1415926535897...) => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t (\"tan\", π = 3.1415926535897...) => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t (\"cos\", π = 3.1415926535897...) => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t (\"sin\", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t (\"tan\", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"As you might imagine, BenchmarkGroup supports a subset of Julia's Associative interface. A full list of these supported functions can be found in the reference document.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"One can also create a nested BenchmarkGroup simply by indexing the keys:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"suite2 = BenchmarkGroup()\n\nsuite2[\"my\"][\"nested\"][\"benchmark\"] = @benchmarkable sum(randn(32))","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"which will result in a hierarchical benchmark without us needing to create the BenchmarkGroup at each level ourselves.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Note that keys are automatically created upon access, even if a key does not exist. Thus, if you wish to empty the unused keys, you can use clear_empty!(suite) to do so.","category":"page"},{"location":"manual/#Tuning-and-running-a-BenchmarkGroup","page":"Manual","title":"Tuning and running a BenchmarkGroup","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"Similarly to individual benchmarks, you can tune! and run whole BenchmarkGroup instances (following from the previous section):","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# execute `tune!` on every benchmark in `suite`\njulia> tune!(suite);\n\n# run with a time limit of ~1 second per benchmark\njulia> results = run(suite, verbose = true, seconds = 1)\n(1/2) benchmarking \"utf8\"...\n (1/2) benchmarking \"join\"...\n done (took 1.15406904 seconds)\n (2/2) benchmarking \"replace\"...\n done (took 0.47660775 seconds)\ndone (took 1.697970114 seconds)\n(2/2) benchmarking \"trig\"...\n (1/6) benchmarking (\"tan\",π = 3.1415926535897...)...\n done (took 0.371586549 seconds)\n (2/6) benchmarking (\"cos\",0.0)...\n done (took 0.284178292 seconds)\n (3/6) benchmarking (\"cos\",π = 3.1415926535897...)...\n done (took 0.338527685 seconds)\n (4/6) benchmarking (\"sin\",π = 3.1415926535897...)...\n done (took 0.345329397 seconds)\n (5/6) benchmarking (\"sin\",0.0)...\n done (took 0.309887335 seconds)\n (6/6) benchmarking (\"tan\",0.0)...\n done (took 0.320894744 seconds)\ndone (took 2.022673065 seconds)\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"utf8\" => BenchmarkGroup([\"string\", \"unicode\"])\n \"trig\" => BenchmarkGroup([\"math\", \"triangles\"])","category":"page"},{"location":"manual/#Working-with-trial-data-in-a-BenchmarkGroup","page":"Manual","title":"Working with trial data in a BenchmarkGroup","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"Following from the previous section, we see that running our benchmark suite returns a BenchmarkGroup that stores Trial data instead of benchmarks:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> results[\"utf8\"]\nBenchmarkTools.BenchmarkGroup:\n tags: [\"string\", \"unicode\"]\n \"join\" => Trial(133.84 ms) # summary(::Trial) displays the minimum time estimate\n \"replace\" => Trial(202.3 μs)\n\njulia> results[\"trig\"]\nBenchmarkTools.BenchmarkGroup:\n tags: [\"math\", \"triangles\"]\n (\"tan\",π = 3.1415926535897...) => Trial(28.0 ns)\n (\"cos\",0.0) => Trial(6.0 ns)\n (\"cos\",π = 3.1415926535897...) => Trial(22.0 ns)\n (\"sin\",π = 3.1415926535897...) => Trial(21.0 ns)\n (\"sin\",0.0) => Trial(6.0 ns)\n (\"tan\",0.0) => Trial(6.0 ns)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Most of the functions on result-related types (Trial, TrialEstimate, TrialRatio, and TrialJudgement) work on BenchmarkGroups as well. Usually, these functions simply map onto the groups' values:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> m1 = median(results[\"utf8\"]) # == median(results[\"utf8\"])\nBenchmarkTools.BenchmarkGroup:\n tags: [\"string\", \"unicode\"]\n \"join\" => TrialEstimate(143.68 ms)\n \"replace\" => TrialEstimate(203.24 μs)\n\njulia> m2 = median(run(suite[\"utf8\"]))\nBenchmarkTools.BenchmarkGroup:\n tags: [\"string\", \"unicode\"]\n \"join\" => TrialEstimate(144.79 ms)\n \"replace\" => TrialEstimate(202.49 μs)\n\njulia> judge(m1, m2; time_tolerance = 0.001) # use 0.1 % time tolerance\nBenchmarkTools.BenchmarkGroup:\n tags: [\"string\", \"unicode\"]\n \"join\" => TrialJudgement(-0.76% => improvement)\n \"replace\" => TrialJudgement(+0.37% => regression)","category":"page"},{"location":"manual/#Indexing-into-a-BenchmarkGroup-using-@tagged","page":"Manual","title":"Indexing into a BenchmarkGroup using @tagged","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"Sometimes, especially in large benchmark suites, you'd like to filter benchmarks by topic without necessarily worrying about the key-value structure of the suite. For example, you might want to run all string-related benchmarks, even though they might be spread out among many different groups or subgroups. To solve this problem, the BenchmarkGroup type incorporates a tagging system.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Consider the following BenchmarkGroup, which contains several nested child groups that are all individually tagged:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g = BenchmarkGroup([], # no tags in the parent\n \"c\" => BenchmarkGroup([\"5\", \"6\", \"7\"]), # tagged \"5\", \"6\", \"7\"\n \"b\" => BenchmarkGroup([\"3\", \"4\", \"5\"]), # tagged \"3\", \"4\", \"5\"\n \"a\" => BenchmarkGroup([\"1\", \"2\", \"3\"], # contains tags and child groups\n \"d\" => BenchmarkGroup([\"8\"], 1 => 1),\n \"e\" => BenchmarkGroup([\"9\"], 2 => 2)));\njulia> g\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"c\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"5\", \"6\", \"7\"]\n \"b\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"3\", \"4\", \"5\"]\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"1\", \"2\", \"3\"]\n\t \"e\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"9\"]\n\t\t 2 => 2\n\t \"d\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"8\"]\n\t\t 1 => 1","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"We can filter this group by tag using the @tagged macro. This macro takes in a special predicate, and returns an object that can be used to index into a BenchmarkGroup. For example, we can select all groups marked \"3\" or \"7\" and not \"1\":","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g[@tagged (\"3\" || \"7\") && !(\"1\")]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"c\" => BenchmarkGroup([\"5\", \"6\", \"7\"])\n \"b\" => BenchmarkGroup([\"3\", \"4\", \"5\"])","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"As you can see, the allowable syntax for the @tagged predicate includes !, (), ||, &&, in addition to the tags themselves. The @tagged macro replaces each tag in the predicate expression with a check to see if the group has the given tag, returning true if so and false otherwise. A group g is considered to have a given tag t if:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"t is attached explicitly to g by construction (e.g. g = BenchmarkGroup([t]))\nt is a key that points to g in g's parent group (e.g. BenchmarkGroup([], t => g))\nt is a tag of one of g's parent groups (all the way up to the root group)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"To demonstrate the last two points:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# also could've used `@tagged \"1\"`, `@tagged \"a\"`, `@tagged \"e\" || \"d\"`\njulia> g[@tagged \"8\" || \"9\"]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"1\", \"2\", \"3\"]\n\t \"e\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"9\"]\n\t\t 2 => 2\n\t \"d\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"8\"]\n\t\t 1 => 1\n\njulia> g[@tagged \"d\"]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"1\", \"2\", \"3\"]\n\t \"d\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"8\"]\n\t\t 1 => 1\n\njulia> g[@tagged \"9\"]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"1\", \"2\", \"3\"]\n\t \"e\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"9\"]\n\t\t 2 => 2","category":"page"},{"location":"manual/#Indexing-into-a-BenchmarkGroup-using-another-BenchmarkGroup","page":"Manual","title":"Indexing into a BenchmarkGroup using another BenchmarkGroup","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"It's sometimes useful to create BenchmarkGroup where the keys are drawn from one BenchmarkGroup, but the values are drawn from another. You can accomplish this by indexing into the latter BenchmarkGroup with the former:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g # leaf values are integers\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"c\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"2\" => 2\n\t \"3\" => 3\n \"b\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"2\" => 2\n\t \"3\" => 3\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"2\" => 2\n\t \"3\" => 3\n \"d\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"2\" => 2\n\t \"3\" => 3\n\njulia> x # note that leaf values are characters\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"c\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"2\" => '2'\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => '1'\n\t \"3\" => '3'\n \"d\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => '1'\n\t \"2\" => '2'\n\t \"3\" => '3'\n\njulia> g[x] # index into `g` with the keys of `x`\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"c\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"2\" => 2\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"3\" => 3\n \"d\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"2\" => 2\n\t \"3\" => 3","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"An example scenario where this would be useful: You have a suite of benchmarks, and a corresponding group of TrialJudgements, and you want to rerun the benchmarks in your suite that are considered regressions in the judgement group. You can easily do this with the following code:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"run(suite[regressions(judgements)])","category":"page"},{"location":"manual/#Indexing-into-a-BenchmarkGroup-using-a-Vector","page":"Manual","title":"Indexing into a BenchmarkGroup using a Vector","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"You may have noticed that nested BenchmarkGroup instances form a tree-like structure, where the root node is the parent group, intermediate nodes are child groups, and the leaves take values like trial data and benchmark definitions.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Since these trees can be arbitrarily asymmetric, it can be cumbersome to write certain BenchmarkGroup transformations using only the indexing facilities previously discussed.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"To solve this problem, BenchmarkTools allows you to uniquely index group nodes using a Vector of the node's parents' keys. For example:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g = BenchmarkGroup([], 1 => BenchmarkGroup([], \"a\" => BenchmarkGroup([], :b => 1234)));\n\njulia> g\nBenchmarkTools.BenchmarkGroup:\n tags: []\n 1 => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"a\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: []\n\t\t :b => 1234\n\njulia> g[[1]] # == g[1]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t :b => 1234\njulia> g[[1, \"a\"]] # == g[1][\"a\"]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n :b => 1234\njulia> g[[1, \"a\", :b]] # == g[1][\"a\"][:b]\n1234","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Keep in mind that this indexing scheme also works with setindex!:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g[[1, \"a\", :b]] = \"hello\"\n\"hello\"\n\njulia> g\nBenchmarkTools.BenchmarkGroup:\n tags: []\n 1 => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"a\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: []\n\t\t :b => \"hello\"","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Assigning into a BenchmarkGroup with a Vector creates sub-groups as necessary:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g[[2, \"a\", :b]] = \"hello again\"\n\"hello again\"\n\njulia> g\n2-element BenchmarkTools.BenchmarkGroup:\n tags: []\n 2 => 1-element BenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => 1-element BenchmarkTools.BenchmarkGroup:\n tags: []\n :b => \"hello again\"\n 1 => 1-element BenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => 1-element BenchmarkTools.BenchmarkGroup:\n tags: []\n :b => \"hello\"","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"You can use the leaves function to construct an iterator over a group's leaf index/value pairs:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g = BenchmarkGroup([\"1\"],\n \"2\" => BenchmarkGroup([\"3\"], 1 => 1),\n 4 => BenchmarkGroup([\"3\"], 5 => 6),\n 7 => 8,\n 9 => BenchmarkGroup([\"2\"],\n 10 => BenchmarkGroup([\"3\"]),\n 11 => BenchmarkGroup()));\n\njulia> collect(leaves(g))\n3-element Array{Any,1}:\n ([7],8)\n ([4,5],6)\n ([\"2\",1],1)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Note that terminal child group nodes are not considered \"leaves\" by the leaves function.","category":"page"},{"location":"manual/#Caching-Parameters","page":"Manual","title":"Caching Parameters","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"A common workflow used in BenchmarkTools is the following:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Start a Julia session\nExecute a benchmark suite using an old version of your package julia old_results = run(suite, verbose = true)\nSave the results somehow (e.g. in a JSON file) julia BenchmarkTools.save(\"old_results.json\", old_results)\nStart a new Julia session\nExecute a benchmark suite using a new version of your package\nresults = run(suite, verbose = true)\nCompare the new results with the results saved in step 3 to determine regression status julia old_results = BenchmarkTools.load(\"old_results.json\") BenchmarkTools.judge(minimum(results), minimum(old_results))","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"There are a couple of problems with this workflow, and all of which revolve around parameter tuning (which would occur during steps 2 and 5):","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Consistency: Given enough time, successive calls to tune! will usually yield reasonably consistent values for the \"evaluations per sample\" parameter, even in spite of noise. However, some benchmarks are highly sensitive to slight changes in this parameter. Thus, it would be best to have some guarantee that all experiments are configured equally (i.e., a guarantee that step 2 will use the exact same parameters as step 5).\nTurnaround time: For most benchmarks, tune! needs to perform many evaluations to determine the proper parameters for any given benchmark - often more evaluations than are performed when running a trial. In fact, the majority of total benchmarking time is usually spent tuning parameters, rather than actually running trials.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools solves these problems by allowing you to pre-tune your benchmark suite, save the \"evaluations per sample\" parameters, and load them on demand:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# untuned example suite\njulia> suite\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"utf8\" => BenchmarkGroup([\"string\", \"unicode\"])\n \"trig\" => BenchmarkGroup([\"math\", \"triangles\"])\n\n# tune the suite to configure benchmark parameters\njulia> tune!(suite);\n\n# save the suite's parameters using a thin wrapper\n# over JSON (this wrapper maintains compatibility\n# across BenchmarkTools versions)\njulia> BenchmarkTools.save(\"params.json\", params(suite));","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Now, instead of tuning suite every time we load the benchmarks in a new Julia session, we can simply load the parameters in the JSON file using the loadparams! function. The [1] on the load call gets the first value that was serialized into the JSON file, which in this case is the parameters.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# syntax is loadparams!(group, paramsgroup, fields...)\njulia> loadparams!(suite, BenchmarkTools.load(\"params.json\")[1], :evals, :samples);","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Caching parameters in this manner leads to a far shorter turnaround time, and more importantly, much more consistent results.","category":"page"},{"location":"manual/#Visualizing-benchmark-results","page":"Manual","title":"Visualizing benchmark results","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"For comparing two or more benchmarks against one another, you can manually specify the range of the histogram using an IOContext to set :histmin and :histmax:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> io = IOContext(stdout, :histmin=>0.5, :histmax=>8, :logbins=>true)\nIOContext(Base.TTY(RawFD(13) open, 0 bytes waiting))\n\njulia> b = @benchmark x^3 setup=(x = rand()); show(io, MIME(\"text/plain\"), b)\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 1.239 ns … 31.433 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 1.244 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 1.266 ns ± 0.611 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █\n ▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂\n 0.5 ns Histogram: log(frequency) by time 8 ns <\n\n Memory estimate: 0 bytes, allocs estimate: 0.\njulia> b = @benchmark x^3.0 setup=(x = rand()); show(io, MIME(\"text/plain\"), b)\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 5.636 ns … 38.756 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 5.662 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 5.767 ns ± 1.384 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █▆ ▂ ▁\n ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁███▄▄▃█▁▁▁▁▁▁▁▁▁▁▁▁ █\n 0.5 ns Histogram: log(frequency) by time 8 ns <\n\n Memory estimate: 0 bytes, allocs estimate: 0.\n","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Set :logbins to true or false to ensure that all use the same vertical scaling (log frequency or frequency).","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"The Trial object can be visualized using the BenchmarkPlots package:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"using BenchmarkPlots, StatsPlots\nb = @benchmarkable lu(rand(10,10))\nt = run(b)\n\nplot(t)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"This will show the timing results of the trial as a violin plot. You can use all the keyword arguments from Plots.jl, for instance st=:box or yaxis=:log10.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"If a BenchmarkGroup contains (only) Trials, its results can be visualized simply by","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"using BenchmarkPlots, StatsPlots\nt = run(g)\nplot(t)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"This will display each Trial as a violin plot.","category":"page"},{"location":"manual/#Miscellaneous-tips-and-info","page":"Manual","title":"Miscellaneous tips and info","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools restricts the minimum measurable benchmark execution time to one picosecond.\nIf you use rand or something similar to generate the values that are used in your benchmarks, you should seed the RNG (or provide a seeded RNG) so that the values are consistent between trials/samples/evaluations.\nBenchmarkTools attempts to be robust against machine noise occurring between samples, but BenchmarkTools can't do very much about machine noise occurring between trials. To cut down on the latter kind of noise, it is advised that you dedicate CPUs and memory to the benchmarking Julia process by using a shielding tool such as cset.\nOn some machines, for some versions of BLAS and Julia, the number of BLAS worker threads can exceed the number of available cores. This can occasionally result in scheduling issues and inconsistent performance for BLAS-heavy benchmarks. To fix this issue, you can use BLAS.set_num_threads(i::Int) in the Julia REPL to ensure that the number of BLAS threads is equal to or less than the number of available cores.\n@benchmark is evaluated in global scope, even if called from local scope.","category":"page"},{"location":"reference/#References","page":"Reference","title":"References","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"Modules = [BenchmarkTools]\nPrivate = false","category":"page"},{"location":"reference/#BenchmarkTools.clear_empty!-Tuple{BenchmarkGroup}","page":"Reference","title":"BenchmarkTools.clear_empty!","text":"clear_empty!(group::BenchmarkGroup)\n\nRecursively remove any empty subgroups from group.\n\nUse this to prune a BenchmarkGroup after accessing the incorrect fields, such as g=BenchmarkGroup(); g[1], without storing anything to g[1], which will create an empty subgroup g[1].\n\n\n\n\n\n","category":"method"},{"location":"reference/#BenchmarkTools.tune!","page":"Reference","title":"BenchmarkTools.tune!","text":"tune!(b::Benchmark, p::Parameters = b.params; verbose::Bool = false, pad = \"\", kwargs...)\n\nTune a Benchmark instance.\n\nIf the number of evals in the parameters p has been set manually, this function does nothing.\n\n\n\n\n\n","category":"function"},{"location":"reference/#BenchmarkTools.tune!-Tuple{BenchmarkGroup}","page":"Reference","title":"BenchmarkTools.tune!","text":"tune!(group::BenchmarkGroup; verbose::Bool = false, pad = \"\", kwargs...)\n\nTune a BenchmarkGroup instance. For most benchmarks, tune! needs to perform many evaluations to determine the proper parameters for any given benchmark - often more evaluations than are performed when running a trial. In fact, the majority of total benchmarking time is usually spent tuning parameters, rather than actually running trials.\n\n\n\n\n\n","category":"method"},{"location":"reference/#BenchmarkTools.@ballocated-Tuple","page":"Reference","title":"BenchmarkTools.@ballocated","text":"@ballocated expression [other parameters...]\n\nSimilar to the @allocated macro included with Julia, this returns the number of bytes allocated when executing a given expression. It uses the @benchmark macro, however, and accepts all of the same additional parameters as @benchmark. The returned allocations correspond to the trial with the minimum elapsed time measured during the benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@ballocations-Tuple","page":"Reference","title":"BenchmarkTools.@ballocations","text":"@ballocations expression [other parameters...]\n\nSimilar to the @allocations macro included with Julia, this macro evaluates an expression, discarding the resulting value, and returns the total number of allocations made during its execution.\n\nUnlike @allocations, it uses the @benchmark macro from the BenchmarkTools package, and accepts all of the same additional parameters as @benchmark. The returned number of allocations corresponds to the trial with the minimum elapsed time measured during the benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@belapsed-Tuple","page":"Reference","title":"BenchmarkTools.@belapsed","text":"@belapsed expression [other parameters...]\n\nSimilar to the @elapsed macro included with Julia, this returns the elapsed time (in seconds) to execute a given expression. It uses the @benchmark macro, however, and accepts all of the same additional parameters as @benchmark. The returned time is the minimum elapsed time measured during the benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@benchmark-Tuple","page":"Reference","title":"BenchmarkTools.@benchmark","text":"@benchmark [setup=]\n\nRun benchmark on a given expression.\n\nExample\n\nThe simplest usage of this macro is to put it in front of what you want to benchmark.\n\njulia> @benchmark sin(1)\nBenchmarkTools.Trial:\n memory estimate: 0 bytes\n allocs estimate: 0\n --------------\n minimum time: 13.610 ns (0.00% GC)\n median time: 13.622 ns (0.00% GC)\n mean time: 13.638 ns (0.00% GC)\n maximum time: 21.084 ns (0.00% GC)\n --------------\n samples: 10000\n evals/sample: 998\n\nYou can interpolate values into @benchmark expressions:\n\n# rand(1000) is executed for each evaluation\njulia> @benchmark sum(rand(1000))\nBenchmarkTools.Trial:\n memory estimate: 7.94 KiB\n allocs estimate: 1\n --------------\n minimum time: 1.566 μs (0.00% GC)\n median time: 2.135 μs (0.00% GC)\n mean time: 3.071 μs (25.06% GC)\n maximum time: 296.818 μs (95.91% GC)\n --------------\n samples: 10000\n evals/sample: 10\n\n# rand(1000) is evaluated at definition time, and the resulting\n# value is interpolated into the benchmark expression\njulia> @benchmark sum($(rand(1000)))\nBenchmarkTools.Trial:\n memory estimate: 0 bytes\n allocs estimate: 0\n --------------\n minimum time: 101.627 ns (0.00% GC)\n median time: 101.909 ns (0.00% GC)\n mean time: 103.834 ns (0.00% GC)\n maximum time: 276.033 ns (0.00% GC)\n --------------\n samples: 10000\n evals/sample: 935\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@benchmarkable-Tuple","page":"Reference","title":"BenchmarkTools.@benchmarkable","text":"@benchmarkable [setup=]\n\nCreate a Benchmark instance for the given expression. @benchmarkable has similar syntax with @benchmark. See also @benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@benchmarkset-Tuple{Any, Any}","page":"Reference","title":"BenchmarkTools.@benchmarkset","text":"@benchmarkset \"title\" begin ... end\n\nCreate a benchmark set, or multiple benchmark sets if a for loop is provided.\n\nExamples\n\n@benchmarkset \"suite\" for k in 1:5\n @case \"case $k\" rand($k, $k)\nend\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@bprofile-Tuple","page":"Reference","title":"BenchmarkTools.@bprofile","text":"@bprofile expression [other parameters...]\n\nRun @benchmark while profiling. This is similar to\n\n@profile @benchmark expression [other parameters...]\n\nbut the profiling is applied only to the main execution (after compilation and tuning). The profile buffer is cleared prior to execution.\n\nView the profile results with Profile.print(...). See the profiling section of the Julia manual for more information.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@btime-Tuple","page":"Reference","title":"BenchmarkTools.@btime","text":"@btime expression [other parameters...]\n\nSimilar to the @time macro included with Julia, this executes an expression, printing the time it took to execute and the memory allocated before returning the value of the expression.\n\nUnlike @time, it uses the @benchmark macro, and accepts all of the same additional parameters as @benchmark. The printed time is the minimum elapsed time measured during the benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@btimed-Tuple","page":"Reference","title":"BenchmarkTools.@btimed","text":"@btimed expression [other parameters...]\n\nSimilar to the @timed macro included with Julia, this macro executes an expression and returns a NamedTuple containing the value of the expression, the minimum elapsed time in seconds, the total bytes allocated, the number of allocations, and the garbage collection time in seconds during the benchmark.\n\nUnlike @timed, it uses the @benchmark macro from the BenchmarkTools package for more detailed and consistent performance measurements. The elapsed time reported is the minimum time measured during the benchmark. It accepts all additional parameters supported by @benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@case-Tuple{Any, Vararg{Any}}","page":"Reference","title":"BenchmarkTools.@case","text":"@case title [setup=]\n\nMark an expression as a benchmark case. Must be used inside @benchmarkset.\n\n\n\n\n\n","category":"macro"},{"location":"reference/","page":"Reference","title":"Reference","text":"Base.run\nBenchmarkTools.save\nBenchmarkTools.load","category":"page"},{"location":"reference/#Base.run","page":"Reference","title":"Base.run","text":"run(b::Benchmark[, p::Parameters = b.params]; kwargs...)\n\nRun the benchmark defined by @benchmarkable.\n\n\n\n\n\nrun(group::BenchmarkGroup[, args...]; verbose::Bool = false, pad = \"\", kwargs...)\n\nRun the benchmark group, with benchmark parameters set to group's by default.\n\n\n\n\n\n","category":"function"},{"location":"reference/#BenchmarkTools.save","page":"Reference","title":"BenchmarkTools.save","text":"BenchmarkTools.save(filename, args...)\n\nSave serialized benchmarking objects (e.g. results or parameters) to a JSON file.\n\n\n\n\n\n","category":"function"},{"location":"reference/#BenchmarkTools.load","page":"Reference","title":"BenchmarkTools.load","text":"BenchmarkTools.load(filename)\n\nLoad serialized benchmarking objects (e.g. results or parameters) from a JSON file.\n\n\n\n\n\n","category":"function"},{"location":"#BenchmarkTools","page":"Home","title":"BenchmarkTools","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"BenchmarkTools makes performance tracking of Julia code easy by supplying a framework for writing and running groups of benchmarks as well as comparing benchmark results.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is used to write and run the benchmarks found in BaseBenchmarks.jl.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The CI infrastructure for automated performance testing of the Julia language is not in this package, but can be found in Nanosoldier.jl.","category":"page"},{"location":"#Quick-Start","page":"Home","title":"Quick Start","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The primary macro provided by BenchmarkTools is @benchmark:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> using BenchmarkTools\n\n# The `setup` expression is run once per sample, and is not included in the\n# timing results. Note that each sample can require multiple evaluations\n# benchmark kernel evaluations. See the BenchmarkTools manual for details.\njulia> @benchmark sort(data) setup=(data=rand(10))\nBenchmarkTools.Trial:\n 10000 samples with 968 evaulations took a median time of 90.902 ns (0.00% GC)\n Time (mean ± σ): 94.936 ns ± 47.797 ns (GC: 2.78% ± 5.03%)\n Range (min … max): 77.655 ns … 954.823 ns (GC: 0.00% … 87.94%)\n\n ▁▃▅▆▇█▇▆▅▂▁ \n ▂▂▃▃▄▅▆▇███████████▇▆▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂\n 77.7 ns Histogram: frequency by time 137 ns\n\n Memory estimate: 160 bytes, allocs estimate: 1.","category":"page"},{"location":"","page":"Home","title":"Home","text":"For quick sanity checks, one can use the @btime macro, which is a convenience wrapper around @benchmark whose output is analogous to Julia's built-in @time macro:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> @btime sin(x) setup=(x=rand())\n 4.361 ns (0 allocations: 0 bytes)\n0.49587200950472454","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you're interested in profiling a fast-running command, you can use @bprofile sin(x) setup=(x=rand()) and then your favorite tools for displaying the results (Profile.print or a graphical viewer).","category":"page"},{"location":"","page":"Home","title":"Home","text":"If the expression you want to benchmark depends on external variables, you should use $ to \"interpolate\" them into the benchmark expression to avoid the problems of benchmarking with globals. Essentially, any interpolated variable $x or expression $(...) is \"pre-computed\" before benchmarking begins:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> A = rand(3,3);\n\njulia> @btime inv($A); # we interpolate the global variable A with $A\n 1.191 μs (10 allocations: 2.31 KiB)\n\njulia> @btime inv($(rand(3,3))); # interpolation: the rand(3,3) call occurs before benchmarking\n 1.192 μs (10 allocations: 2.31 KiB)\n\njulia> @btime inv(rand(3,3)); # the rand(3,3) call is included in the benchmark time\n 1.295 μs (11 allocations: 2.47 KiB)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Sometimes, interpolating variables into very simple expressions can give the compiler more information than you intended, causing it to \"cheat\" the benchmark by hoisting the calculation out of the benchmark code","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> a = 1; b = 2\n2\n\njulia> @btime $a + $b\n 0.024 ns (0 allocations: 0 bytes)\n3","category":"page"},{"location":"","page":"Home","title":"Home","text":"As a rule of thumb, if a benchmark reports that it took less than a nanosecond to perform, this hoisting probably occurred. You can avoid this by referencing and dereferencing the interpolated variables ","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> @btime $(Ref(a))[] + $(Ref(b))[]\n 1.277 ns (0 allocations: 0 bytes)\n3","category":"page"},{"location":"","page":"Home","title":"Home","text":"As described in the Manual, the BenchmarkTools package supports many other features, both for additional output and for more fine-grained control over the benchmarking process.","category":"page"},{"location":"internals/#Internals","page":"Internals","title":"Internals","text":"","category":"section"},{"location":"internals/","page":"Internals","title":"Internals","text":"Modules = [BenchmarkTools]\nPublic = false\nFilter = f -> f !== Base.run","category":"page"},{"location":"internals/#Base.isempty-Tuple{BenchmarkGroup}","page":"Internals","title":"Base.isempty","text":"isempty(group::BenchmarkGroup)\n\nReturn true if group is empty. This will first run clear_empty! on group to recursively remove any empty subgroups.\n\n\n\n\n\n","category":"method"},{"location":"internals/#BenchmarkTools._withprogress-Tuple{Any, AbstractString, BenchmarkGroup}","page":"Internals","title":"BenchmarkTools._withprogress","text":"_withprogress(\n name::AbstractString,\n group::BenchmarkGroup;\n kwargs...,\n) do progressid, nleaves, ndone\n ...\nend\n\nExecute do block with following arguments:\n\nprogressid: logging ID to be used for @logmsg.\nnleaves: total number of benchmarks counted at the root benchmark group.\nndone: number of completed benchmarks\n\nThey are either extracted from kwargs (for sub-groups) or newly created (for root benchmark group).\n\n\n\n\n\n","category":"method"},{"location":"internals/#BenchmarkTools.load-Tuple{AbstractString, Vararg{Any}}","page":"Internals","title":"BenchmarkTools.load","text":"BenchmarkTools.load(filename)\n\nLoad serialized benchmarking objects (e.g. results or parameters) from a JSON file.\n\n\n\n\n\n","category":"method"},{"location":"internals/#BenchmarkTools.quasiquote!-Tuple{Any, Vararg{Any}}","page":"Internals","title":"BenchmarkTools.quasiquote!","text":"quasiquote!(expr::Expr, vars::Vector{Symbol}, vals::Vector{Expr})\n\nReplace every interpolated value in expr with a placeholder variable and store the resulting variable / value pairings in vars and vals.\n\n\n\n\n\n","category":"method"},{"location":"internals/#BenchmarkTools.save-Tuple{AbstractString, Vararg{Any}}","page":"Internals","title":"BenchmarkTools.save","text":"BenchmarkTools.save(filename, args...)\n\nSave serialized benchmarking objects (e.g. results or parameters) to a JSON file.\n\n\n\n\n\n","category":"method"},{"location":"linuxtips/#Reproducible-benchmarking-in-Linux-based-environments","page":"Linux-based environments","title":"Reproducible benchmarking in Linux-based environments","text":"","category":"section"},{"location":"linuxtips/#Introduction","page":"Linux-based environments","title":"Introduction","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"This document is all about identifying and avoiding potential reproducibility pitfalls when executing performance tests in a Linux-based environment.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"When I started working on performance regression testing for the Julia language, I was surprised that I couldn't find an up-to-date and noob-friendly checklist that succinctly consolidated the performance wisdom scattered across various forums and papers. My hope is that this document provides a starting point for researchers who are new to performance testing on Linux, and who might be trying to figure out why theoretically identical benchmark trials generate significantly different results.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"To the uninitiated, tracking down and eliminating \"OS jitter\" can sometimes feel more like an art than a science. You'll quickly find that setting up a proper environment for rigorous performance testing requires scouring the internet and academic literature for esoteric references to scheduler quirks and kernel flags. Some of these parameters might drastically affect the outcome of your particular benchmark suite, while others may demand inordinate amounts of experimentation just to prove that they don't affect your benchmarks at all.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"This document's goal is not to improve the performance of your application, help you simulate a realistic production environment, or provide in-depth explanations for various kernel mechanisms. It is currently a bit light on NUMA-specific details, but alas, I don't have access to a NUMA-enabled machine to play with. I'm sure that knowledgable readers will find opportunities for corrections and additions, in which case I'd be grateful if you filed an issue or opened a pull request in this repository.","category":"page"},{"location":"linuxtips/#Processor-shielding-and-process-affinity","page":"Linux-based environments","title":"Processor shielding and process affinity","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Processor shielding is a technique that invokes Linux's cpuset pseudo-filesystem to set up exclusive processors and memory nodes that are protected from Linux's scheduler. The easiest way to create and utilize a processor shield is with cset, a convenient Python wrapper over the cpuset interface. On Ubuntu, cset can be installed by running the following:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo apt-get install cpuset","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"It's worth reading the extensive cset tutorial available on RTwiki. As a short example, here's how one might shield processors 1 and 3 from uninvited threads (including most kernel threads, specified by -k on):","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo cset shield -c 1,3 -k on\ncset: --> activating shielding:\ncset: moving 67 tasks from root into system cpuset...\n[==================================================]%\ncset: kthread shield activated, moving 91 tasks into system cpuset...\n[==================================================]%\ncset: **> 34 tasks are not movable, impossible to move\ncset: \"system\" cpuset of CPUSPEC(0,2) with 124 tasks running\ncset: \"user\" cpuset of CPUSPEC(1,3) with 0 tasks running","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"After setting up a shield, you can execute processes within it via the -e flag (note that arguments to the process must be provided after the -- separator):","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo cset shield -e echo -- \"hello from within the shield\"\ncset: --> last message, executed args into cpuset \"/user\", new pid is: 27782\nhello from within the shield\n➜ sudo cset shield -e julia -- benchmark.jl\ncset: --> last message, executed args into cpuset \"/user\", new pid is: 27792\nrunning benchmarks...","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"For slightly lower-level control, you can use cset's other subcommands, proc and set. The actual cpuset kernel interface offers even more options, notably memory hardwalling and scheduling settings.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"To maximize consistency between trials, you should make sure that individual threads executed within the shield always use the exact same processor/memory node configuration. This can be accomplished using hierarchical cpusets to pin processes to child cpusets created under the shielded cpuset. Other utilities for managing process affinity, like taskset, numactl, or tuna, aren't as useful as cset because they don't protect dedicated resources from the scheduler.","category":"page"},{"location":"linuxtips/#Virtual-memory-settings","page":"Linux-based environments","title":"Virtual memory settings","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"The official Linux documentation lists a plethora of virtual memory settings for configuring Linux's swapping, paging, and caching behavior. I encourage the reader to independently investigate the vm.nr_hugepages, vm.vfs_cache_pressure, vm.zone_reclaim_mode, and vm.min_free_kbytes properties, but won't discuss these in-depth because they are not likely to have a large impact in the majority of cases. Instead, I'll focus on two properties which are easier to experiment with and a bit less subtle in their effects: swappiness and address space layout randomization.","category":"page"},{"location":"linuxtips/#Swappiness","page":"Linux-based environments","title":"Swappiness","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Most Linux distributions are configured to swap aggressively by default, which can heavily skew performance results by increasing the likelihood of swapping during benchmark execution. Luckily, it's easy to tame the kernel's propensity to swap by lowering the swappiness setting, controlled via the vm.swappiness parameter:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo sysctl vm.swappiness=10","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"In my experience, lowering vm.swappiness to around 10 or so is sufficient to overcome swap-related noise on most memory-bound benchmarks.","category":"page"},{"location":"linuxtips/#Address-space-layout-randomization-(ASLR)","page":"Linux-based environments","title":"Address space layout randomization (ASLR)","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Address space layout randomization (ASLR) is a security feature that makes it harder for malicious programs to exploit buffer overflows. In theory, ASLR could significantly impact reproducibility for benchmarks that are highly susceptible to variations in memory layout. Disabling ASLR should be done at your own risk - it is a security feature, after all.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"ASLR can be disabled globally by setting randomize_va_space to 0:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo sysctl kernel.randomize_va_space=0","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"If you don't wish to disable ASLR globally, you can simply start up an ASLR-disabled shell by running:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ setarch $(uname -m) -R /bin/sh","category":"page"},{"location":"linuxtips/#CPU-frequency-scaling-and-boosting","page":"Linux-based environments","title":"CPU frequency scaling and boosting","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Most modern CPUs support dynamic frequency scaling, which is the ability to adjust their clock rate in order to manage power usage and temperature. On Linux, frequency scaling behavior is determined by heuristics dubbed \"governors\", each of which prioritizes different patterns of resource utilization. This feature can interfere with performance results if rescaling occurs during benchmarking or between trials, but luckily we can keep the effective clock rate static by enabling the performance governor on all processors:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ echo \"performance\" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"You can check that this command worked by making sure that cat /proc/cpuinfo | grep 'cpu MHz' spits out the same values as cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Many CPUs also support discretionary performance \"boosting\", which is similar to dynamic frequency scaling and can have the same negative impacts on benchmark reproducibility. To disable CPU boosting, you can run the following:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost","category":"page"},{"location":"linuxtips/#Hyperthreading","page":"Linux-based environments","title":"Hyperthreading","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Hyperthreading, more generally known as simultaneous multithreading (SMT), allows multiple software threads to \"simultaneously\" run on \"independent\" hardware threads on a single CPU core. The downside is that these threads can't always actually execute concurrently in practice, as they contend for shared CPU resources. Frustratingly, Linux exposes these threads to the operating system as extra logical processors, making techniques like shielding difficult to reason about - how do you know that your shielded \"processor\" isn't actually sharing a physical core with an unshielded \"processor\"? Unless your use case demands that you run tests in a hyperthreaded environment, you should consider disabling hyperthreading to make it easier to manage processor resources consistently.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"The first step to disabling hyperthreading is to check whether it's actually enabled on your machine. To do so, you can use lscpu:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ lscpu\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nByte Order: Little Endian\nCPU(s): 8 \nOn-line CPU(s) list: 0-7\nThread(s) per core: 2 \nCore(s) per socket: 4 \nSocket(s): 1\nNUMA node(s): 1\nVendor ID: GenuineIntel\nCPU family: 6\nModel: 60\nStepping: 3\nCPU MHz: 3501.000\nBogoMIPS: 6999.40\nVirtualization: VT-x\nL1d cache: 32K\nL1i cache: 32K\nL2 cache: 256K\nL3 cache: 8192K\nNUMA node0 CPU(s): 0-7","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"In the above output, the CPU(s) field tells us there are 8 logical processors. The other fields allow us to do a more granular breakdown: 1 socket times 4 cores per socket gives us 4 physical cores, times 2 threads per core gives us 8 logical processors. Since there are more logical processors than physical cores, we know hyperthreading is enabled.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Before we start disabling processors, we need to know which ones share a physical core:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list\n0,4\n1,5\n2,6\n3,7\n0,4\n1,5\n2,6\n3,7","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Each row above is in the format i,j, and can be read logical processor i shares a physical core with logical processor j. We can disable hyperthreading by taking excess sibling processors offline, leaving only one logical processor per physical core. In our example, we can accomplish this by disabling processors 4, 5, 6, and 7:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ echo 0 | sudo tee /sys/devices/system/cpu/cpu4/online\n0\n➜ echo 0 | sudo tee /sys/devices/system/cpu/cpu5/online\n0\n➜ echo 0 | sudo tee /sys/devices/system/cpu/cpu6/online\n0\n➜ echo 0 | sudo tee /sys/devices/system/cpu/cpu7/online\n0","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Now, we can verify that hyperthreading is disabled by checking each processor's thread_siblings_list again:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list\n0\n1\n2\n3","category":"page"},{"location":"linuxtips/#Interrupt-requests-and-SMP-affinity","page":"Linux-based environments","title":"Interrupt requests and SMP affinity","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"The kernel will periodically send interrupt requests (IRQs) to your processors. As the name implies, IRQs ask a processor to pause the currently running task in order to perform the requested task. There are many different kinds of IRQs, and the degree to which a specific kind of IRQ interferes with a given benchmark depends on the frequency and duration of the IRQ compared to the benchmark's workload.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"The good news is that most kinds of IRQs allow you to set an SMP affinity, which tells the kernel which processor an IRQ should be sent to. By properly configuring SMP affinities, we can send IRQs to the unshielded processors in our benchmarking environment, thus protecting the shielded processors from undesirable interruptions.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"You can use Linux's proc pseudo-filesystem to get a list of interrupts that have occurred on your system since your last reboot:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ cat /proc/interrupts\n CPU0 CPU1\n 0: 19 0 IR-IO-APIC-edge timer\n 8: 1 0 IR-IO-APIC-edge rtc0\n 9: 0 0 IR-IO-APIC-fasteoi acpi\n 16: 27 0 IR-IO-APIC-fasteoi ehci_hcd:usb1\n 22: 12 0 IR-IO-APIC-fasteoi ehci_hcd:usb2\n ⋮\n 53: 18021763 122330 IR-PCI-MSI-edge eth0-TxRx-7\nNMI: 15661 13628 Non-maskable interrupts\nLOC: 140221744 85225898 Local timer interrupts\nSPU: 0 0 Spurious interrupts\nPMI: 15661 13628 Performance monitoring interrupts\nIWI: 23570041 3729274 IRQ work interrupts\nRTR: 7 0 APIC ICR read retries\nRES: 3153272 4187108 Rescheduling interrupts\nCAL: 3401 10460 Function call interrupts\nTLB: 4434976 3071723 TLB shootdowns\nTRM: 0 0 Thermal event interrupts\nTHR: 0 0 Threshold APIC interrupts\nMCE: 0 0 Machine check exceptions\nMCP: 61112 61112 Machine check polls\nERR: 0\nMIS: 0","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Some interrupts, like non-maskable interrupts (NMI), can't be redirected, but you can change the SMP affinities of the rest by writing processor indices to /proc/irq/n/smp_affinity_list, where n is the IRQ number. Here's an example that sets IRQ 22's SMP affinity to processors 0, 1, and 2:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ echo 0-2 | sudo tee /proc/irq/22/smp_affinity_list","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"The optimal way to configure SMP affinities depends a lot on your benchmarks and benchmarking process. For example, if you're running a lot of network-bound benchmarks, it can sometimes be more beneficial to evenly balance ethernet driver interrupts (usually named something like eth0-*) than to restrict them to specific processors.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"A smoke test for determining the impact of IRQs on benchmark results is to see what happens when you turn on/off an IRQ load balancer like irqbalance. If this has a noticeable effect on your results, it might be worth playing around with SMP affinities to figure out which IRQs should be directed away from your shielded processors.","category":"page"},{"location":"linuxtips/#Performance-monitoring-interrupts-(PMIs)-and-perf","page":"Linux-based environments","title":"Performance monitoring interrupts (PMIs) and perf","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Performance monitoring interrupts (PMIs) are sent by the kernel's perf subsystem, which is used to set and manage hardware performance counters monitored by other parts of the kernel. Unless perf is a dependency of your benchmarking process, it may be useful to lower perf's sample rate so that PMIs don't interfere with your experiments. One way to do this is to set the kernel.perf_cpu_time_max_percent parameter to 1:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo sysctl kernel.perf_cpu_time_max_percent=1","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"This tells the kernel to inform perf that it should lower its sample rate such that sampling consumes less than 1% of CPU time. After changing this parameter, you may see messages in the system log like:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"[ 3835.065463] perf samples too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"These messages are nothing to be concerned about - it's simply the kernel reporting that it's lowering perf's max sample rate in order to respect the perf_cpu_time_max_percent property we just set.","category":"page"},{"location":"linuxtips/#Additional-resources","page":"Linux-based environments","title":"Additional resources","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"While not highly navigable and a bit overwhelming for newcomers, the most authoritative resource for kernel information is the official Linux documentation hosted at the Linux Kernel Archives.\nAkkan et al.'s 2012 paper on developing a noiseless Linux environment explores the optimal configurations for isolating resources from timer interrupts and the scheduler, as well as the benefits of tickless kernels. The paper makes use of Linux's cgroups, which are similar to the cpusets discussed in this document.\nDe et al.'s 2009 paper on reducing OS jitter in multithreaded systems is similar to Akkan et al.'s paper, but focuses on minimizing jitter for applications that make use of hyperthreading/SMT. Their experimental approach is different as well, relying heavily on analysis of simulated jitter \"traces\" attained by clever benchmarking.\nFor a solid overview of the Linux performance testing ecosystem, check out Brendan Gregg's talk on Linux performance tools. Note that this talk is more focused on debugging system performance problems as they arise in a large distributed environment, rather than application benchmarking or experimental reproducibility.\nThe RHEL6 Performance Tuning Guide is useful for introducing yourself to various kernel constructs that can cause performance problems. You can also check out the RHEL7 version of the same guide if you want something more recent, but I find the RHEL6 version more readable.","category":"page"}] +[{"location":"manual/#Manual","page":"Manual","title":"Manual","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools was created to facilitate the following tasks:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Organize collections of benchmarks into manageable benchmark suites\nConfigure, save, and reload benchmark parameters for convenience, accuracy, and consistency\nExecute benchmarks in a manner that yields reasonable and consistent performance predictions\nAnalyze and compare results to determine whether a code change caused regressions or improvements","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Before we get too far, let's define some of the terminology used in this document:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"\"evaluation\": a single execution of a benchmark expression.\n\"sample\": a single time/memory measurement obtained by running multiple evaluations.\n\"trial\": an experiment in which multiple samples are gathered (or the result of such an experiment).\n\"benchmark parameters\": the configuration settings that determine how a benchmark trial is performed","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"The reasoning behind our definition of \"sample\" may not be obvious to all readers. If the time to execute a benchmark is smaller than the resolution of your timing method, then a single evaluation of the benchmark will generally not produce a valid sample. In that case, one must approximate a valid sample by recording the total time t it takes to record n evaluations, and estimating the sample's time per evaluation as t/n. For example, if a sample takes 1 second for 1 million evaluations, the approximate time per evaluation for that sample is 1 microsecond. It's not obvious what the right number of evaluations per sample should be for any given benchmark, so BenchmarkTools provides a mechanism (the tune! method) to automatically figure it out for you.","category":"page"},{"location":"manual/#Benchmarking-basics","page":"Manual","title":"Benchmarking basics","text":"","category":"section"},{"location":"manual/#Defining-and-executing-benchmarks","page":"Manual","title":"Defining and executing benchmarks","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"To quickly benchmark a Julia expression, use @benchmark:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @benchmark sin(1)\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 1.442 ns … 53.028 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 1.453 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 1.462 ns ± 0.566 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █ \n ▂▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃\n 1.44 ns Histogram: frequency by time 1.46 ns (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"The @benchmark macro is essentially shorthand for defining a benchmark, auto-tuning the benchmark's configuration parameters, and running the benchmark. These three steps can be done explicitly using @benchmarkable, tune! and run:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> b = @benchmarkable sin(1); # define the benchmark with default parameters\n\n# find the right evals/sample and number of samples to take for this benchmark\njulia> tune!(b);\n\njulia> run(b)\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 1.442 ns … 4.308 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 1.453 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 1.456 ns ± 0.056 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █ \n ▂▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▃\n 1.44 ns Histogram: frequency by time 1.46 ns (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Alternatively, you can use the @btime, @btimed, @belapsed, @ballocated, or @ballocations macros. These take exactly the same arguments as @benchmark, but behave like the @time, @timed, @elapsed, @allocated, or @allocations macros included with Julia.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @btime sin(1)\n 13.612 ns (0 allocations: 0 bytes)\n0.8414709848078965\n\njulia> @belapsed sin(1)\n1.3614228456913828e-8\n\njulia> @btimed sin(1)\n(value = 0.8414709848078965, time = 9.16e-10, bytes = 0, alloc = 0, gctime = 0.0)\n\njulia> @ballocated rand(4, 4)\n208\n\njulia> @ballocations rand(4, 4)\n2","category":"page"},{"location":"manual/#Benchmark-Parameters","page":"Manual","title":"Benchmark Parameters","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"You can pass the following keyword arguments to @benchmark, @benchmarkable, and run to configure the execution process:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"samples: The number of samples to take. Execution will end if this many samples have been collected. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.samples = 10000.\nseconds: The number of seconds budgeted for the benchmarking process. The trial will terminate if this time is exceeded (regardless of samples), but at least one sample will always be taken. In practice, actual runtime can overshoot the budget by the duration of a sample. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.seconds = 5.\nevals: The number of evaluations per sample. For best results, this should be kept consistent between trials. A good guess for this value can be automatically set on a benchmark via tune!, but using tune! can be less consistent than setting evals manually (which bypasses tuning). Defaults to BenchmarkTools.DEFAULT_PARAMETERS.evals = 1. If the function you study mutates its input, it is probably a good idea to set evals=1 manually.\noverhead: The estimated loop overhead per evaluation in nanoseconds, which is automatically subtracted from every sample time measurement. The default value is BenchmarkTools.DEFAULT_PARAMETERS.overhead = 0. BenchmarkTools.estimate_overhead can be called to determine this value empirically (which can then be set as the default value, if you want).\ngctrial: If true, run gc() before executing this benchmark's trial. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.gctrial = true.\ngcsample: If true, run gc() before each sample. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.gcsample = false.\ntime_tolerance: The noise tolerance for the benchmark's time estimate, as a percentage. This is utilized after benchmark execution, when analyzing results. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.time_tolerance = 0.05.\nmemory_tolerance: The noise tolerance for the benchmark's memory estimate, as a percentage. This is utilized after benchmark execution, when analyzing results. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.memory_tolerance = 0.01.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"To change the default values of the above fields, one can mutate the fields of BenchmarkTools.DEFAULT_PARAMETERS, for example:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# change default for `seconds` to 2.5\nBenchmarkTools.DEFAULT_PARAMETERS.seconds = 2.50\n# change default for `time_tolerance` to 0.20\nBenchmarkTools.DEFAULT_PARAMETERS.time_tolerance = 0.20","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Here's an example that demonstrates how to pass these parameters to benchmark definitions:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"b = @benchmarkable sin(1) seconds=1 time_tolerance=0.01\nrun(b) # equivalent to run(b, seconds = 1, time_tolerance = 0.01)","category":"page"},{"location":"manual/#Interpolating-values-into-benchmark-expressions","page":"Manual","title":"Interpolating values into benchmark expressions","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"You can interpolate values into @benchmark and @benchmarkable expressions:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# rand(1000) is executed for each evaluation\njulia> @benchmark sum(rand(1000))\nBenchmarkTools.Trial: 10000 samples with 10 evaluations.\n Range (min … max): 1.153 μs … 142.253 μs ┊ GC (min … max): 0.00% … 96.43%\n Time (median): 1.363 μs ┊ GC (median): 0.00%\n Time (mean ± σ): 1.786 μs ± 4.612 μs ┊ GC (mean ± σ): 9.58% ± 3.70%\n\n ▄▆██▇▇▆▄▃▂▁ ▁▁▂▂▂▂▂▂▂▁▂▁ \n ████████████████▆▆▇▅▆▇▆▆▆▇▆▇▆▆▅▄▄▄▅▃▄▇██████████████▇▇▇▇▆▆▇▆▆▅▅▅▅\n 1.15 μs Histogram: log(frequency) by time 3.8 μs (top 1%)\n\n Memory estimate: 7.94 KiB, allocs estimate: 1.\n\n# rand(1000) is evaluated at definition time, and the resulting\n# value is interpolated into the benchmark expression\njulia> @benchmark sum($(rand(1000)))\nBenchmarkTools.Trial: 10000 samples with 963 evaluations.\n Range (min … max): 84.477 ns … 241.602 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 84.497 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 85.125 ns ± 5.262 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █ \n █▅▇▅▄███▇▇▆▆▆▄▄▅▅▄▄▅▄▄▅▄▄▄▄▁▃▄▁▁▃▃▃▄▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▃▃▁▁▁▃▁▁▁▁▆\n 84.5 ns Histogram: log(frequency) by time 109 ns (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"A good rule of thumb is that external variables should be explicitly interpolated into the benchmark expression:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> A = rand(1000);\n\n# BAD: A is a global variable in the benchmarking context\njulia> @benchmark [i*i for i in A]\nBenchmarkTools.Trial: 10000 samples with 54 evaluations.\n Range (min … max): 889.241 ns … 29.584 μs ┊ GC (min … max): 0.00% … 93.33%\n Time (median): 1.073 μs ┊ GC (median): 0.00%\n Time (mean ± σ): 1.296 μs ± 2.004 μs ┊ GC (mean ± σ): 14.31% ± 8.76%\n\n ▃█▆ \n ▂▂▄▆███▇▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▂▂▁▁▁▁▁▂▁▁▁▁▂▂▁▁▁▁▂▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂\n 889 ns Histogram: frequency by time 2.92 μs (top 1%)\n\n Memory estimate: 7.95 KiB, allocs estimate: 2.\n\n# GOOD: A is a constant value in the benchmarking context\njulia> @benchmark [i*i for i in $A]\nBenchmarkTools.Trial: 10000 samples with 121 evaluations.\n Range (min … max): 742.455 ns … 11.846 μs ┊ GC (min … max): 0.00% … 88.05%\n Time (median): 909.959 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 1.135 μs ± 1.366 μs ┊ GC (mean ± σ): 16.94% ± 12.58%\n\n ▇█▅▂ ▁\n ████▇▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅▆██\n 742 ns Histogram: log(frequency) by time 10.3 μs (top 1%)\n\n Memory estimate: 7.94 KiB, allocs estimate: 1.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"(Note that \"KiB\" is the SI prefix for a kibibyte: 1024 bytes.)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Keep in mind that you can mutate external state from within a benchmark:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> A = zeros(3);\n\n # each evaluation will modify A\njulia> b = @benchmarkable fill!($A, rand());\n\njulia> run(b, samples = 1);\n\njulia> A\n3-element Vector{Float64}:\n 0.4615582142515109\n 0.4615582142515109\n 0.4615582142515109\n\njulia> run(b, samples = 1);\n\njulia> A\n3-element Vector{Float64}:\n 0.06373849439691504\n 0.06373849439691504\n 0.06373849439691504","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Normally, you can't use locally scoped variables in @benchmark or @benchmarkable, since all benchmarks are defined at the top-level scope by design. However, you can work around this by interpolating local variables into the benchmark expression:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# will throw UndefVar error for `x`\njulia> let x = 1\n @benchmark sin(x)\n end\n\n# will work fine\njulia> let x = 1\n @benchmark sin($x)\n end","category":"page"},{"location":"manual/#Setup-and-teardown-phases","page":"Manual","title":"Setup and teardown phases","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools allows you to pass setup and teardown expressions to @benchmark and @benchmarkable. The setup expression is evaluated just before sample execution, while the teardown expression is evaluated just after sample execution. Here's an example where this kind of thing is useful:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> x = rand(100000);\n\n# For each sample, bind a variable `y` to a fresh copy of `x`. As you\n# can see, `y` is accessible within the scope of the core expression.\njulia> b = @benchmarkable sort!(y) setup=(y = copy($x))\nBenchmark(evals=1, seconds=5.0, samples=10000)\n\njulia> run(b)\nBenchmarkTools.Trial: 819 samples with 1 evaluations.\n Range (min … max): 5.983 ms … 6.954 ms ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 6.019 ms ┊ GC (median): 0.00%\n Time (mean ± σ): 6.029 ms ± 46.222 μs ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n ▃▂▂▄█▄▂▃ \n ▂▃▃▄▆▅████████▇▆▆▅▄▄▄▅▆▄▃▄▅▄▃▂▃▃▃▂▂▃▁▂▂▂▁▂▂▂▂▂▂▁▁▁▁▂▂▁▁▁▂▂▁▁▂▁▁▂\n 5.98 ms Histogram: frequency by time 6.18 ms (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"In the above example, we wish to benchmark Julia's in-place sorting method. Without a setup phase, we'd have to either allocate a new input vector for each sample (such that the allocation time would pollute our results) or use the same input vector every sample (such that all samples but the first would benchmark the wrong thing - sorting an already sorted vector). The setup phase solves the problem by allowing us to do some work that can be utilized by the core expression, without that work being erroneously included in our performance results.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Note that the setup and teardown phases are executed for each sample, not each evaluation. Thus, the sorting example above wouldn't produce the intended results if evals/sample > 1 (it'd suffer from the same problem of benchmarking against an already sorted vector).","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"If your setup involves several objects, you need to separate the assignments with semicolons, as follows:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @btime x + y setup = (x=1; y=2) # works\n 1.238 ns (0 allocations: 0 bytes)\n3\n\njulia> @btime x + y setup = (x=1, y=2) # errors\nERROR: UndefVarError: `x` not defined","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"This also explains the error you get if you accidentally put a comma in the setup for a single argument:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @btime exp(x) setup = (x=1,) # errors\nERROR: UndefVarError: `x` not defined","category":"page"},{"location":"manual/#Understanding-compiler-optimizations","page":"Manual","title":"Understanding compiler optimizations","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"It's possible for LLVM and Julia's compiler to perform optimizations on @benchmarkable expressions. In some cases, these optimizations can elide a computation altogether, resulting in unexpectedly \"fast\" benchmarks. For example, the following expression is non-allocating:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @benchmark (view(a, 1:2, 1:2); 1) setup=(a = rand(3, 3))\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 2.885 ns … 14.797 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 2.895 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 3.320 ns ± 0.909 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █ ▁ ▁ ▁▁▁ ▂▃▃▁\n █▁▁▇█▇▆█▇████████████████▇█▇█▇▇▇▇█▇█▇▅▅▄▁▁▁▁▄▃▁▃▃▁▄▃▁▄▁▃▅▅██████\n 2.88 ns Histogram: log(frequency) by time 5.79 ns (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.0","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Note, however, that this does not mean that view(a, 1:2, 1:2) is non-allocating:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @benchmark view(a, 1:2, 1:2) setup=(a = rand(3, 3))\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 3.175 ns … 18.314 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 3.176 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 3.262 ns ± 0.882 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █ \n █▁▂▁▁▁▂▁▂▁▂▁▁▂▁▁▂▂▂▂▂▂▁▁▂▁▁▂▁▁▁▂▂▁▁▁▂▁▂▂▁▂▁▁▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂\n 3.18 ns Histogram: frequency by time 4.78 ns (top 1%)\n\n Memory estimate: 0 bytes, allocs estimate: 0.8","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"The key point here is that these two benchmarks measure different things, even though their code is similar. In the first example, Julia was able to optimize away view(a, 1:2, 1:2) because it could prove that the value wasn't being returned and a wasn't being mutated. In the second example, the optimization is not performed because view(a, 1:2, 1:2) is a return value of the benchmark expression.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools will faithfully report the performance of the exact code that you provide to it, including any compiler optimizations that might happen to elide the code completely. It's up to you to design benchmarks which actually exercise the code you intend to exercise. ","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"A common place julia's optimizer may cause a benchmark to not measure what a user thought it was measuring is simple operations where all values are known at compile time. Suppose you wanted to measure the time it takes to add together two integers:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> a = 1; b = 2\n2\n\njulia> @btime $a + $b\n 0.024 ns (0 allocations: 0 bytes)\n3","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"in this case julia was able to use the properties of +(::Int, ::Int) to know that it could safely replace $a + $b with 3 at compile time. We can stop the optimizer from doing this by referencing and dereferencing the interpolated variables ","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> @btime $(Ref(a))[] + $(Ref(b))[]\n 1.277 ns (0 allocations: 0 bytes)\n3","category":"page"},{"location":"manual/#Handling-benchmark-results","page":"Manual","title":"Handling benchmark results","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools provides four types related to benchmark results:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Trial: stores all samples collected during a benchmark trial, as well as the trial's parameters\nTrialEstimate: a single estimate used to summarize a Trial\nTrialRatio: a comparison between two TrialEstimate\nTrialJudgement: a classification of the fields of a TrialRatio as invariant, regression, or improvement","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"This section provides a limited number of examples demonstrating these types. For a thorough list of supported functionality, see the reference document.","category":"page"},{"location":"manual/#Trial-and-TrialEstimate","page":"Manual","title":"Trial and TrialEstimate","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"Running a benchmark produces an instance of the Trial type:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> t = @benchmark eigen(rand(10, 10))\nBenchmarkTools.Trial: 10000 samples with 1 evaluations.\n Range (min … max): 26.549 μs … 1.503 ms ┊ GC (min … max): 0.00% … 93.21%\n Time (median): 30.818 μs ┊ GC (median): 0.00%\n Time (mean ± σ): 31.777 μs ± 25.161 μs ┊ GC (mean ± σ): 1.31% ± 1.63%\n\n ▂▃▅▆█▇▇▆▆▄▄▃▁▁ \n ▁▁▁▁▁▁▂▃▄▆████████████████▆▆▅▅▄▄▃▃▃▂▂▂▂▂▂▁▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\n 26.5 μs Histogram: frequency by time 41.3 μs (top 1%)\n\n Memory estimate: 16.36 KiB, allocs estimate: 19.\n\njulia> dump(t) # here's what's actually stored in a Trial\nBenchmarkTools.Trial\n params: BenchmarkTools.Parameters\n seconds: Float64 5.0\n samples: Int64 10000\n evals: Int64 1\n overhead: Float64 0.0\n gctrial: Bool true\n gcsample: Bool false\n time_tolerance: Float64 0.05\n memory_tolerance: Float64 0.01\n times: Array{Float64}((10000,)) [26549.0, 26960.0, 27030.0, 27171.0, 27211.0, 27261.0, 27270.0, 27311.0, 27311.0, 27321.0 … 55383.0, 55934.0, 58649.0, 62847.0, 68547.0, 75761.0, 247081.0, 1.421718e6, 1.488322e6, 1.50329e6]\n gctimes: Array{Float64}((10000,)) [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.366184e6, 1.389518e6, 1.40116e6]\n memory: Int64 16752\n allocs: Int64 19","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"As you can see from the above, a couple of different timing estimates are pretty-printed with the Trial. You can calculate these estimates yourself using the minimum, maximum, median, mean, and std functions (Note that median, mean, and std are reexported in BenchmarkTools from Statistics):","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> minimum(t)\nBenchmarkTools.TrialEstimate: \n time: 26.549 μs\n gctime: 0.000 ns (0.00%)\n memory: 16.36 KiB\n allocs: 19\n\njulia> maximum(t)\nBenchmarkTools.TrialEstimate: \n time: 1.503 ms\n gctime: 1.401 ms (93.21%)\n memory: 16.36 KiB\n allocs: 19\n\njulia> median(t)\nBenchmarkTools.TrialEstimate: \n time: 30.818 μs\n gctime: 0.000 ns (0.00%)\n memory: 16.36 KiB\n allocs: 19\n\njulia> mean(t)\nBenchmarkTools.TrialEstimate: \n time: 31.777 μs\n gctime: 415.686 ns (1.31%)\n memory: 16.36 KiB\n allocs: 19\n\njulia> std(t)\nBenchmarkTools.TrialEstimate: \n time: 25.161 μs\n gctime: 23.999 μs (95.38%)\n memory: 16.36 KiB\n allocs: 19","category":"page"},{"location":"manual/#Which-estimator-should-I-use?","page":"Manual","title":"Which estimator should I use?","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"Time distributions are always right-skewed for the benchmarks we've tested. This phenomena can be justified by considering that the machine noise affecting the benchmarking process is, in some sense, inherently positive - there aren't really sources of noise that would regularly cause your machine to execute a series of instructions faster than the theoretical \"ideal\" time prescribed by your hardware. Following this characterization of benchmark noise, we can describe the behavior of our estimators:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"The minimum is a robust estimator for the location parameter of the time distribution, and should not be considered an outlier\nThe median, as a robust measure of central tendency, should be relatively unaffected by outliers\nThe mean, as a non-robust measure of central tendency, will usually be positively skewed by outliers\nThe maximum should be considered a primarily noise-driven outlier, and can change drastically between benchmark trials.","category":"page"},{"location":"manual/#TrialRatio-and-TrialJudgement","page":"Manual","title":"TrialRatio and TrialJudgement","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools supplies a ratio function for comparing two values:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> ratio(3, 2)\n1.5\n\njulia> ratio(1, 0)\nInf\n\njulia> ratio(0, 1)\n0.0\n\n# a == b is special-cased to 1.0 to prevent NaNs in this case\njulia> ratio(0, 0)\n1.0","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Calling the ratio function on two TrialEstimate instances compares their fields:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> using BenchmarkTools\n\njulia> b = @benchmarkable eigen(rand(10, 10));\n\njulia> tune!(b);\n\njulia> m1 = median(run(b))\nBenchmarkTools.TrialEstimate:\n time: 38.638 μs\n gctime: 0.000 ns (0.00%)\n memory: 9.30 KiB\n allocs: 28\n\njulia> m2 = median(run(b))\nBenchmarkTools.TrialEstimate:\n time: 38.723 μs\n gctime: 0.000 ns (0.00%)\n memory: 9.30 KiB\n allocs: 28\n\njulia> ratio(m1, m2)\nBenchmarkTools.TrialRatio:\n time: 0.997792009916587\n gctime: 1.0\n memory: 1.0\n allocs: 1.0","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Use the judge function to decide if the estimate passed as first argument represents a regression versus the second estimate:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> m1 = median(@benchmark eigen(rand(10, 10)))\nBenchmarkTools.TrialEstimate:\n time: 38.745 μs\n gctime: 0.000 ns (0.00%)\n memory: 9.30 KiB\n allocs: 28\n\njulia> m2 = median(@benchmark eigen(rand(10, 10)))\nBenchmarkTools.TrialEstimate:\n time: 38.611 μs\n gctime: 0.000 ns (0.00%)\n memory: 9.30 KiB\n allocs: 28\n\n# percent change falls within noise tolerance for all fields\njulia> judge(m1, m2)\nBenchmarkTools.TrialJudgement:\n time: +0.35% => invariant (5.00% tolerance)\n memory: +0.00% => invariant (1.00% tolerance)\n\n# changing time_tolerance causes it to be marked as a regression\njulia> judge(m1, m2; time_tolerance = 0.0001)\nBenchmarkTools.TrialJudgement:\n time: +0.35% => regression (0.01% tolerance)\n memory: +0.00% => invariant (1.00% tolerance)\n\n# switch m1 & m2; from this perspective, the difference is an improvement\njulia> judge(m2, m1; time_tolerance = 0.0001)\nBenchmarkTools.TrialJudgement:\n time: -0.35% => improvement (0.01% tolerance)\n memory: +0.00% => invariant (1.00% tolerance)\n\n# you can pass in TrialRatios as well\njulia> judge(ratio(m1, m2)) == judge(m1, m2)\ntrue","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Note that changes in GC time and allocation count aren't classified by judge. This is because GC time and allocation count, while sometimes useful for answering why a regression occurred, are not generally useful for answering if a regression occurred. Instead, it's usually only differences in time and memory usage that determine whether or not a code change is an improvement or a regression. For example, in the unlikely event that a code change decreased time and memory usage, but increased GC time and allocation count, most people would consider that code change to be an improvement. The opposite is also true: an increase in time and memory usage would be considered a regression no matter how much GC time or allocation count decreased.","category":"page"},{"location":"manual/#The-BenchmarkGroup-type","page":"Manual","title":"The BenchmarkGroup type","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"In the real world, one often deals with whole suites of benchmarks rather than just individual benchmarks. The BenchmarkGroup type serves as the \"organizational unit\" of such suites, and can be used to store and structure benchmark definitions, raw Trial data, estimation results, and even other BenchmarkGroup instances.","category":"page"},{"location":"manual/#Defining-benchmark-suites","page":"Manual","title":"Defining benchmark suites","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"A BenchmarkGroup stores a Dict that maps benchmark IDs to values, as well as descriptive \"tags\" that can be used to filter the group by topic. To get started, let's demonstrate how one might use the BenchmarkGroup type to define a simple benchmark suite:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# Define a parent BenchmarkGroup to contain our suite\nsuite = BenchmarkGroup()\n\n# Add some child groups to our benchmark suite. The most relevant BenchmarkGroup constructor\n# for this case is BenchmarkGroup(tags::Vector). These tags are useful for\n# filtering benchmarks by topic, which we'll cover in a later section.\nsuite[\"utf8\"] = BenchmarkGroup([\"string\", \"unicode\"])\nsuite[\"trig\"] = BenchmarkGroup([\"math\", \"triangles\"])\n\n# Add some benchmarks to the \"utf8\" group\nteststr = join(rand('a':'d', 10^4));\nsuite[\"utf8\"][\"replace\"] = @benchmarkable replace($teststr, \"a\" => \"b\")\nsuite[\"utf8\"][\"join\"] = @benchmarkable join($teststr, $teststr)\n\n# Add some benchmarks to the \"trig\" group\nfor f in (sin, cos, tan)\n for x in (0.0, pi)\n suite[\"trig\"][string(f), x] = @benchmarkable $(f)($x)\n end\nend","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Let's look at our newly defined suite in the REPL:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> suite\n2-element BenchmarkTools.BenchmarkGroup:\n tags: []\n \"utf8\" => 2-element BenchmarkTools.BenchmarkGroup:\n\t tags: [\"string\", \"unicode\"]\n\t \"join\" => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t \"replace\" => Benchmark(evals=1, seconds=5.0, samples=10000)\n \"trig\" => 6-element BenchmarkTools.BenchmarkGroup:\n\t tags: [\"math\", \"triangles\"]\n\t (\"cos\", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t (\"sin\", π = 3.1415926535897...) => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t (\"tan\", π = 3.1415926535897...) => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t (\"cos\", π = 3.1415926535897...) => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t (\"sin\", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)\n\t (\"tan\", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"As you might imagine, BenchmarkGroup supports a subset of Julia's Associative interface. A full list of these supported functions can be found in the reference document.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"One can also create a nested BenchmarkGroup simply by indexing the keys:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"suite2 = BenchmarkGroup()\n\nsuite2[\"my\"][\"nested\"][\"benchmark\"] = @benchmarkable sum(randn(32))","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"which will result in a hierarchical benchmark without us needing to create the BenchmarkGroup at each level ourselves.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Note that keys are automatically created upon access, even if a key does not exist. Thus, if you wish to empty the unused keys, you can use clear_empty!(suite) to do so.","category":"page"},{"location":"manual/#Tuning-and-running-a-BenchmarkGroup","page":"Manual","title":"Tuning and running a BenchmarkGroup","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"Similarly to individual benchmarks, you can tune! and run whole BenchmarkGroup instances (following from the previous section):","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# execute `tune!` on every benchmark in `suite`\njulia> tune!(suite);\n\n# run with a time limit of ~1 second per benchmark\njulia> results = run(suite, verbose = true, seconds = 1)\n(1/2) benchmarking \"utf8\"...\n (1/2) benchmarking \"join\"...\n done (took 1.15406904 seconds)\n (2/2) benchmarking \"replace\"...\n done (took 0.47660775 seconds)\ndone (took 1.697970114 seconds)\n(2/2) benchmarking \"trig\"...\n (1/6) benchmarking (\"tan\",π = 3.1415926535897...)...\n done (took 0.371586549 seconds)\n (2/6) benchmarking (\"cos\",0.0)...\n done (took 0.284178292 seconds)\n (3/6) benchmarking (\"cos\",π = 3.1415926535897...)...\n done (took 0.338527685 seconds)\n (4/6) benchmarking (\"sin\",π = 3.1415926535897...)...\n done (took 0.345329397 seconds)\n (5/6) benchmarking (\"sin\",0.0)...\n done (took 0.309887335 seconds)\n (6/6) benchmarking (\"tan\",0.0)...\n done (took 0.320894744 seconds)\ndone (took 2.022673065 seconds)\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"utf8\" => BenchmarkGroup([\"string\", \"unicode\"])\n \"trig\" => BenchmarkGroup([\"math\", \"triangles\"])","category":"page"},{"location":"manual/#Working-with-trial-data-in-a-BenchmarkGroup","page":"Manual","title":"Working with trial data in a BenchmarkGroup","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"Following from the previous section, we see that running our benchmark suite returns a BenchmarkGroup that stores Trial data instead of benchmarks:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> results[\"utf8\"]\nBenchmarkTools.BenchmarkGroup:\n tags: [\"string\", \"unicode\"]\n \"join\" => Trial(133.84 ms) # summary(::Trial) displays the minimum time estimate\n \"replace\" => Trial(202.3 μs)\n\njulia> results[\"trig\"]\nBenchmarkTools.BenchmarkGroup:\n tags: [\"math\", \"triangles\"]\n (\"tan\",π = 3.1415926535897...) => Trial(28.0 ns)\n (\"cos\",0.0) => Trial(6.0 ns)\n (\"cos\",π = 3.1415926535897...) => Trial(22.0 ns)\n (\"sin\",π = 3.1415926535897...) => Trial(21.0 ns)\n (\"sin\",0.0) => Trial(6.0 ns)\n (\"tan\",0.0) => Trial(6.0 ns)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Most of the functions on result-related types (Trial, TrialEstimate, TrialRatio, and TrialJudgement) work on BenchmarkGroups as well. Usually, these functions simply map onto the groups' values:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> m1 = median(results[\"utf8\"]) # == median(results[\"utf8\"])\nBenchmarkTools.BenchmarkGroup:\n tags: [\"string\", \"unicode\"]\n \"join\" => TrialEstimate(143.68 ms)\n \"replace\" => TrialEstimate(203.24 μs)\n\njulia> m2 = median(run(suite[\"utf8\"]))\nBenchmarkTools.BenchmarkGroup:\n tags: [\"string\", \"unicode\"]\n \"join\" => TrialEstimate(144.79 ms)\n \"replace\" => TrialEstimate(202.49 μs)\n\njulia> judge(m1, m2; time_tolerance = 0.001) # use 0.1 % time tolerance\nBenchmarkTools.BenchmarkGroup:\n tags: [\"string\", \"unicode\"]\n \"join\" => TrialJudgement(-0.76% => improvement)\n \"replace\" => TrialJudgement(+0.37% => regression)","category":"page"},{"location":"manual/#Indexing-into-a-BenchmarkGroup-using-@tagged","page":"Manual","title":"Indexing into a BenchmarkGroup using @tagged","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"Sometimes, especially in large benchmark suites, you'd like to filter benchmarks by topic without necessarily worrying about the key-value structure of the suite. For example, you might want to run all string-related benchmarks, even though they might be spread out among many different groups or subgroups. To solve this problem, the BenchmarkGroup type incorporates a tagging system.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Consider the following BenchmarkGroup, which contains several nested child groups that are all individually tagged:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g = BenchmarkGroup([], # no tags in the parent\n \"c\" => BenchmarkGroup([\"5\", \"6\", \"7\"]), # tagged \"5\", \"6\", \"7\"\n \"b\" => BenchmarkGroup([\"3\", \"4\", \"5\"]), # tagged \"3\", \"4\", \"5\"\n \"a\" => BenchmarkGroup([\"1\", \"2\", \"3\"], # contains tags and child groups\n \"d\" => BenchmarkGroup([\"8\"], 1 => 1),\n \"e\" => BenchmarkGroup([\"9\"], 2 => 2)));\njulia> g\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"c\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"5\", \"6\", \"7\"]\n \"b\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"3\", \"4\", \"5\"]\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"1\", \"2\", \"3\"]\n\t \"e\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"9\"]\n\t\t 2 => 2\n\t \"d\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"8\"]\n\t\t 1 => 1","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"We can filter this group by tag using the @tagged macro. This macro takes in a special predicate, and returns an object that can be used to index into a BenchmarkGroup. For example, we can select all groups marked \"3\" or \"7\" and not \"1\":","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g[@tagged (\"3\" || \"7\") && !(\"1\")]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"c\" => BenchmarkGroup([\"5\", \"6\", \"7\"])\n \"b\" => BenchmarkGroup([\"3\", \"4\", \"5\"])","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"As you can see, the allowable syntax for the @tagged predicate includes !, (), ||, &&, in addition to the tags themselves. The @tagged macro replaces each tag in the predicate expression with a check to see if the group has the given tag, returning true if so and false otherwise. A group g is considered to have a given tag t if:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"t is attached explicitly to g by construction (e.g. g = BenchmarkGroup([t]))\nt is a key that points to g in g's parent group (e.g. BenchmarkGroup([], t => g))\nt is a tag of one of g's parent groups (all the way up to the root group)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"To demonstrate the last two points:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# also could've used `@tagged \"1\"`, `@tagged \"a\"`, `@tagged \"e\" || \"d\"`\njulia> g[@tagged \"8\" || \"9\"]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"1\", \"2\", \"3\"]\n\t \"e\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"9\"]\n\t\t 2 => 2\n\t \"d\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"8\"]\n\t\t 1 => 1\n\njulia> g[@tagged \"d\"]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"1\", \"2\", \"3\"]\n\t \"d\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"8\"]\n\t\t 1 => 1\n\njulia> g[@tagged \"9\"]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: [\"1\", \"2\", \"3\"]\n\t \"e\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: [\"9\"]\n\t\t 2 => 2","category":"page"},{"location":"manual/#Indexing-into-a-BenchmarkGroup-using-another-BenchmarkGroup","page":"Manual","title":"Indexing into a BenchmarkGroup using another BenchmarkGroup","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"It's sometimes useful to create BenchmarkGroup where the keys are drawn from one BenchmarkGroup, but the values are drawn from another. You can accomplish this by indexing into the latter BenchmarkGroup with the former:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g # leaf values are integers\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"c\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"2\" => 2\n\t \"3\" => 3\n \"b\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"2\" => 2\n\t \"3\" => 3\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"2\" => 2\n\t \"3\" => 3\n \"d\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"2\" => 2\n\t \"3\" => 3\n\njulia> x # note that leaf values are characters\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"c\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"2\" => '2'\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => '1'\n\t \"3\" => '3'\n \"d\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => '1'\n\t \"2\" => '2'\n\t \"3\" => '3'\n\njulia> g[x] # index into `g` with the keys of `x`\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"c\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"2\" => 2\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"3\" => 3\n \"d\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"1\" => 1\n\t \"2\" => 2\n\t \"3\" => 3","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"An example scenario where this would be useful: You have a suite of benchmarks, and a corresponding group of TrialJudgements, and you want to rerun the benchmarks in your suite that are considered regressions in the judgement group. You can easily do this with the following code:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"run(suite[regressions(judgements)])","category":"page"},{"location":"manual/#Indexing-into-a-BenchmarkGroup-using-a-Vector","page":"Manual","title":"Indexing into a BenchmarkGroup using a Vector","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"You may have noticed that nested BenchmarkGroup instances form a tree-like structure, where the root node is the parent group, intermediate nodes are child groups, and the leaves take values like trial data and benchmark definitions.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Since these trees can be arbitrarily asymmetric, it can be cumbersome to write certain BenchmarkGroup transformations using only the indexing facilities previously discussed.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"To solve this problem, BenchmarkTools allows you to uniquely index group nodes using a Vector of the node's parents' keys. For example:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g = BenchmarkGroup([], 1 => BenchmarkGroup([], \"a\" => BenchmarkGroup([], :b => 1234)));\n\njulia> g\nBenchmarkTools.BenchmarkGroup:\n tags: []\n 1 => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"a\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: []\n\t\t :b => 1234\n\njulia> g[[1]] # == g[1]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t :b => 1234\njulia> g[[1, \"a\"]] # == g[1][\"a\"]\nBenchmarkTools.BenchmarkGroup:\n tags: []\n :b => 1234\njulia> g[[1, \"a\", :b]] # == g[1][\"a\"][:b]\n1234","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Keep in mind that this indexing scheme also works with setindex!:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g[[1, \"a\", :b]] = \"hello\"\n\"hello\"\n\njulia> g\nBenchmarkTools.BenchmarkGroup:\n tags: []\n 1 => BenchmarkTools.BenchmarkGroup:\n\t tags: []\n\t \"a\" => BenchmarkTools.BenchmarkGroup:\n\t\t tags: []\n\t\t :b => \"hello\"","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Assigning into a BenchmarkGroup with a Vector creates sub-groups as necessary:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g[[2, \"a\", :b]] = \"hello again\"\n\"hello again\"\n\njulia> g\n2-element BenchmarkTools.BenchmarkGroup:\n tags: []\n 2 => 1-element BenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => 1-element BenchmarkTools.BenchmarkGroup:\n tags: []\n :b => \"hello again\"\n 1 => 1-element BenchmarkTools.BenchmarkGroup:\n tags: []\n \"a\" => 1-element BenchmarkTools.BenchmarkGroup:\n tags: []\n :b => \"hello\"","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"You can use the leaves function to construct an iterator over a group's leaf index/value pairs:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> g = BenchmarkGroup([\"1\"],\n \"2\" => BenchmarkGroup([\"3\"], 1 => 1),\n 4 => BenchmarkGroup([\"3\"], 5 => 6),\n 7 => 8,\n 9 => BenchmarkGroup([\"2\"],\n 10 => BenchmarkGroup([\"3\"]),\n 11 => BenchmarkGroup()));\n\njulia> collect(leaves(g))\n3-element Array{Any,1}:\n ([7],8)\n ([4,5],6)\n ([\"2\",1],1)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Note that terminal child group nodes are not considered \"leaves\" by the leaves function.","category":"page"},{"location":"manual/#Caching-Parameters","page":"Manual","title":"Caching Parameters","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"A common workflow used in BenchmarkTools is the following:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Start a Julia session\nExecute a benchmark suite using an old version of your package julia old_results = run(suite, verbose = true)\nSave the results somehow (e.g. in a JSON file) julia BenchmarkTools.save(\"old_results.json\", old_results)\nStart a new Julia session\nExecute a benchmark suite using a new version of your package\nresults = run(suite, verbose = true)\nCompare the new results with the results saved in step 3 to determine regression status julia old_results = BenchmarkTools.load(\"old_results.json\") BenchmarkTools.judge(minimum(results), minimum(old_results))","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"There are a couple of problems with this workflow, and all of which revolve around parameter tuning (which would occur during steps 2 and 5):","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Consistency: Given enough time, successive calls to tune! will usually yield reasonably consistent values for the \"evaluations per sample\" parameter, even in spite of noise. However, some benchmarks are highly sensitive to slight changes in this parameter. Thus, it would be best to have some guarantee that all experiments are configured equally (i.e., a guarantee that step 2 will use the exact same parameters as step 5).\nTurnaround time: For most benchmarks, tune! needs to perform many evaluations to determine the proper parameters for any given benchmark - often more evaluations than are performed when running a trial. In fact, the majority of total benchmarking time is usually spent tuning parameters, rather than actually running trials.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools solves these problems by allowing you to pre-tune your benchmark suite, save the \"evaluations per sample\" parameters, and load them on demand:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# untuned example suite\njulia> suite\nBenchmarkTools.BenchmarkGroup:\n tags: []\n \"utf8\" => BenchmarkGroup([\"string\", \"unicode\"])\n \"trig\" => BenchmarkGroup([\"math\", \"triangles\"])\n\n# tune the suite to configure benchmark parameters\njulia> tune!(suite);\n\n# save the suite's parameters using a thin wrapper\n# over JSON (this wrapper maintains compatibility\n# across BenchmarkTools versions)\njulia> BenchmarkTools.save(\"params.json\", params(suite));","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Now, instead of tuning suite every time we load the benchmarks in a new Julia session, we can simply load the parameters in the JSON file using the loadparams! function. The [1] on the load call gets the first value that was serialized into the JSON file, which in this case is the parameters.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"# syntax is loadparams!(group, paramsgroup, fields...)\njulia> loadparams!(suite, BenchmarkTools.load(\"params.json\")[1], :evals, :samples);","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Caching parameters in this manner leads to a far shorter turnaround time, and more importantly, much more consistent results.","category":"page"},{"location":"manual/#Visualizing-benchmark-results","page":"Manual","title":"Visualizing benchmark results","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"For comparing two or more benchmarks against one another, you can manually specify the range of the histogram using an IOContext to set :histmin and :histmax:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"julia> io = IOContext(stdout, :histmin=>0.5, :histmax=>8, :logbins=>true)\nIOContext(Base.TTY(RawFD(13) open, 0 bytes waiting))\n\njulia> b = @benchmark x^3 setup=(x = rand()); show(io, MIME(\"text/plain\"), b)\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 1.239 ns … 31.433 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 1.244 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 1.266 ns ± 0.611 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █\n ▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂\n 0.5 ns Histogram: log(frequency) by time 8 ns <\n\n Memory estimate: 0 bytes, allocs estimate: 0.\njulia> b = @benchmark x^3.0 setup=(x = rand()); show(io, MIME(\"text/plain\"), b)\nBenchmarkTools.Trial: 10000 samples with 1000 evaluations.\n Range (min … max): 5.636 ns … 38.756 ns ┊ GC (min … max): 0.00% … 0.00%\n Time (median): 5.662 ns ┊ GC (median): 0.00%\n Time (mean ± σ): 5.767 ns ± 1.384 ns ┊ GC (mean ± σ): 0.00% ± 0.00%\n\n █▆ ▂ ▁\n ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁███▄▄▃█▁▁▁▁▁▁▁▁▁▁▁▁ █\n 0.5 ns Histogram: log(frequency) by time 8 ns <\n\n Memory estimate: 0 bytes, allocs estimate: 0.\n","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"Set :logbins to true or false to ensure that all use the same vertical scaling (log frequency or frequency).","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"The Trial object can be visualized using the BenchmarkPlots package:","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"using BenchmarkPlots, StatsPlots\nb = @benchmarkable lu(rand(10,10))\nt = run(b)\n\nplot(t)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"This will show the timing results of the trial as a violin plot. You can use all the keyword arguments from Plots.jl, for instance st=:box or yaxis=:log10.","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"If a BenchmarkGroup contains (only) Trials, its results can be visualized simply by","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"using BenchmarkPlots, StatsPlots\nt = run(g)\nplot(t)","category":"page"},{"location":"manual/","page":"Manual","title":"Manual","text":"This will display each Trial as a violin plot.","category":"page"},{"location":"manual/#Miscellaneous-tips-and-info","page":"Manual","title":"Miscellaneous tips and info","text":"","category":"section"},{"location":"manual/","page":"Manual","title":"Manual","text":"BenchmarkTools restricts the minimum measurable benchmark execution time to one picosecond.\nIf you use rand or something similar to generate the values that are used in your benchmarks, you should seed the RNG (or provide a seeded RNG) so that the values are consistent between trials/samples/evaluations.\nBenchmarkTools attempts to be robust against machine noise occurring between samples, but BenchmarkTools can't do very much about machine noise occurring between trials. To cut down on the latter kind of noise, it is advised that you dedicate CPUs and memory to the benchmarking Julia process by using a shielding tool such as cset.\nOn some machines, for some versions of BLAS and Julia, the number of BLAS worker threads can exceed the number of available cores. This can occasionally result in scheduling issues and inconsistent performance for BLAS-heavy benchmarks. To fix this issue, you can use BLAS.set_num_threads(i::Int) in the Julia REPL to ensure that the number of BLAS threads is equal to or less than the number of available cores.\n@benchmark is evaluated in global scope, even if called from local scope.","category":"page"},{"location":"reference/#References","page":"Reference","title":"References","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"Modules = [BenchmarkTools]\nPrivate = false","category":"page"},{"location":"reference/#BenchmarkTools.clear_empty!-Tuple{BenchmarkGroup}","page":"Reference","title":"BenchmarkTools.clear_empty!","text":"clear_empty!(group::BenchmarkGroup)\n\nRecursively remove any empty subgroups from group.\n\nUse this to prune a BenchmarkGroup after accessing the incorrect fields, such as g=BenchmarkGroup(); g[1], without storing anything to g[1], which will create an empty subgroup g[1].\n\n\n\n\n\n","category":"method"},{"location":"reference/#BenchmarkTools.judge-Tuple{BenchmarkTools.TrialEstimate, BenchmarkTools.TrialEstimate}","page":"Reference","title":"BenchmarkTools.judge","text":"judge(target::TrialEstimate, baseline::TrialEstimate; [time_tolerance::Float64=0.05])\n\nReport on whether the first estimate target represents a regression or an improvement with respect to the second estimate baseline.\n\n\n\n\n\n","category":"method"},{"location":"reference/#BenchmarkTools.judge-Tuple{BenchmarkTools.TrialRatio}","page":"Reference","title":"BenchmarkTools.judge","text":"judge(r::TrialRatio, [time_tolerance::Float64=0.05])\n\n\n\n\n\n","category":"method"},{"location":"reference/#BenchmarkTools.judge-Tuple{Vararg{BenchmarkGroup}}","page":"Reference","title":"BenchmarkTools.judge","text":"judge(target::BenchmarkGroup, baseline::BenchmarkGroup; [time_tolerance::Float64=0.05])\n\n\n\n\n\n","category":"method"},{"location":"reference/#BenchmarkTools.ratio-Tuple{BenchmarkTools.TrialEstimate, BenchmarkTools.TrialEstimate}","page":"Reference","title":"BenchmarkTools.ratio","text":"ratio(target::TrialEstimate, baseline::TrialEstimate)\n\nReturns a ratio of the target estimate to the baseline estimate, as e.g. time(target)/time(baseline).\n\n\n\n\n\n","category":"method"},{"location":"reference/#BenchmarkTools.tune!","page":"Reference","title":"BenchmarkTools.tune!","text":"tune!(b::Benchmark, p::Parameters = b.params; verbose::Bool = false, pad = \"\", kwargs...)\n\nTune a Benchmark instance.\n\nIf the number of evals in the parameters p has been set manually, this function does nothing.\n\n\n\n\n\n","category":"function"},{"location":"reference/#BenchmarkTools.tune!-Tuple{BenchmarkGroup}","page":"Reference","title":"BenchmarkTools.tune!","text":"tune!(group::BenchmarkGroup; verbose::Bool = false, pad = \"\", kwargs...)\n\nTune a BenchmarkGroup instance. For most benchmarks, tune! needs to perform many evaluations to determine the proper parameters for any given benchmark - often more evaluations than are performed when running a trial. In fact, the majority of total benchmarking time is usually spent tuning parameters, rather than actually running trials.\n\n\n\n\n\n","category":"method"},{"location":"reference/#BenchmarkTools.@ballocated-Tuple","page":"Reference","title":"BenchmarkTools.@ballocated","text":"@ballocated expression [other parameters...]\n\nSimilar to the @allocated macro included with Julia, this returns the number of bytes allocated when executing a given expression. It uses the @benchmark macro, however, and accepts all of the same additional parameters as @benchmark. The returned allocations correspond to the trial with the minimum elapsed time measured during the benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@ballocations-Tuple","page":"Reference","title":"BenchmarkTools.@ballocations","text":"@ballocations expression [other parameters...]\n\nSimilar to the @allocations macro included with Julia, this macro evaluates an expression, discarding the resulting value, and returns the total number of allocations made during its execution.\n\nUnlike @allocations, it uses the @benchmark macro from the BenchmarkTools package, and accepts all of the same additional parameters as @benchmark. The returned number of allocations corresponds to the trial with the minimum elapsed time measured during the benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@belapsed-Tuple","page":"Reference","title":"BenchmarkTools.@belapsed","text":"@belapsed expression [other parameters...]\n\nSimilar to the @elapsed macro included with Julia, this returns the elapsed time (in seconds) to execute a given expression. It uses the @benchmark macro, however, and accepts all of the same additional parameters as @benchmark. The returned time is the minimum elapsed time measured during the benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@benchmark-Tuple","page":"Reference","title":"BenchmarkTools.@benchmark","text":"@benchmark [setup=]\n\nRun benchmark on a given expression.\n\nExample\n\nThe simplest usage of this macro is to put it in front of what you want to benchmark.\n\njulia> @benchmark sin(1)\nBenchmarkTools.Trial:\n memory estimate: 0 bytes\n allocs estimate: 0\n --------------\n minimum time: 13.610 ns (0.00% GC)\n median time: 13.622 ns (0.00% GC)\n mean time: 13.638 ns (0.00% GC)\n maximum time: 21.084 ns (0.00% GC)\n --------------\n samples: 10000\n evals/sample: 998\n\nYou can interpolate values into @benchmark expressions:\n\n# rand(1000) is executed for each evaluation\njulia> @benchmark sum(rand(1000))\nBenchmarkTools.Trial:\n memory estimate: 7.94 KiB\n allocs estimate: 1\n --------------\n minimum time: 1.566 μs (0.00% GC)\n median time: 2.135 μs (0.00% GC)\n mean time: 3.071 μs (25.06% GC)\n maximum time: 296.818 μs (95.91% GC)\n --------------\n samples: 10000\n evals/sample: 10\n\n# rand(1000) is evaluated at definition time, and the resulting\n# value is interpolated into the benchmark expression\njulia> @benchmark sum($(rand(1000)))\nBenchmarkTools.Trial:\n memory estimate: 0 bytes\n allocs estimate: 0\n --------------\n minimum time: 101.627 ns (0.00% GC)\n median time: 101.909 ns (0.00% GC)\n mean time: 103.834 ns (0.00% GC)\n maximum time: 276.033 ns (0.00% GC)\n --------------\n samples: 10000\n evals/sample: 935\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@benchmarkable-Tuple","page":"Reference","title":"BenchmarkTools.@benchmarkable","text":"@benchmarkable [setup=]\n\nCreate a Benchmark instance for the given expression. @benchmarkable has similar syntax with @benchmark. See also @benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@benchmarkset-Tuple{Any, Any}","page":"Reference","title":"BenchmarkTools.@benchmarkset","text":"@benchmarkset \"title\" begin ... end\n\nCreate a benchmark set, or multiple benchmark sets if a for loop is provided.\n\nExamples\n\n@benchmarkset \"suite\" for k in 1:5\n @case \"case $k\" rand($k, $k)\nend\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@bprofile-Tuple","page":"Reference","title":"BenchmarkTools.@bprofile","text":"@bprofile expression [other parameters...]\n\nRun @benchmark while profiling. This is similar to\n\n@profile @benchmark expression [other parameters...]\n\nbut the profiling is applied only to the main execution (after compilation and tuning). The profile buffer is cleared prior to execution.\n\nView the profile results with Profile.print(...). See the profiling section of the Julia manual for more information.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@btime-Tuple","page":"Reference","title":"BenchmarkTools.@btime","text":"@btime expression [other parameters...]\n\nSimilar to the @time macro included with Julia, this executes an expression, printing the time it took to execute and the memory allocated before returning the value of the expression.\n\nUnlike @time, it uses the @benchmark macro, and accepts all of the same additional parameters as @benchmark. The printed time is the minimum elapsed time measured during the benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@btimed-Tuple","page":"Reference","title":"BenchmarkTools.@btimed","text":"@btimed expression [other parameters...]\n\nSimilar to the @timed macro included with Julia, this macro executes an expression and returns a NamedTuple containing the value of the expression, the minimum elapsed time in seconds, the total bytes allocated, the number of allocations, and the garbage collection time in seconds during the benchmark.\n\nUnlike @timed, it uses the @benchmark macro from the BenchmarkTools package for more detailed and consistent performance measurements. The elapsed time reported is the minimum time measured during the benchmark. It accepts all additional parameters supported by @benchmark.\n\n\n\n\n\n","category":"macro"},{"location":"reference/#BenchmarkTools.@case-Tuple{Any, Vararg{Any}}","page":"Reference","title":"BenchmarkTools.@case","text":"@case title [setup=]\n\nMark an expression as a benchmark case. Must be used inside @benchmarkset.\n\n\n\n\n\n","category":"macro"},{"location":"reference/","page":"Reference","title":"Reference","text":"Base.run\nBenchmarkTools.save\nBenchmarkTools.load","category":"page"},{"location":"reference/#Base.run","page":"Reference","title":"Base.run","text":"run(b::Benchmark[, p::Parameters = b.params]; kwargs...)\n\nRun the benchmark defined by @benchmarkable.\n\n\n\n\n\nrun(group::BenchmarkGroup[, args...]; verbose::Bool = false, pad = \"\", kwargs...)\n\nRun the benchmark group, with benchmark parameters set to group's by default.\n\n\n\n\n\n","category":"function"},{"location":"reference/#BenchmarkTools.save","page":"Reference","title":"BenchmarkTools.save","text":"BenchmarkTools.save(filename, args...)\n\nSave serialized benchmarking objects (e.g. results or parameters) to a JSON file.\n\n\n\n\n\n","category":"function"},{"location":"reference/#BenchmarkTools.load","page":"Reference","title":"BenchmarkTools.load","text":"BenchmarkTools.load(filename)\n\nLoad serialized benchmarking objects (e.g. results or parameters) from a JSON file.\n\n\n\n\n\n","category":"function"},{"location":"#BenchmarkTools","page":"Home","title":"BenchmarkTools","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"BenchmarkTools makes performance tracking of Julia code easy by supplying a framework for writing and running groups of benchmarks as well as comparing benchmark results.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is used to write and run the benchmarks found in BaseBenchmarks.jl.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The CI infrastructure for automated performance testing of the Julia language is not in this package, but can be found in Nanosoldier.jl.","category":"page"},{"location":"#Quick-Start","page":"Home","title":"Quick Start","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The primary macro provided by BenchmarkTools is @benchmark:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> using BenchmarkTools\n\n# The `setup` expression is run once per sample, and is not included in the\n# timing results. Note that each sample can require multiple evaluations\n# benchmark kernel evaluations. See the BenchmarkTools manual for details.\njulia> @benchmark sort(data) setup=(data=rand(10))\nBenchmarkTools.Trial:\n 10000 samples with 968 evaulations took a median time of 90.902 ns (0.00% GC)\n Time (mean ± σ): 94.936 ns ± 47.797 ns (GC: 2.78% ± 5.03%)\n Range (min … max): 77.655 ns … 954.823 ns (GC: 0.00% … 87.94%)\n\n ▁▃▅▆▇█▇▆▅▂▁ \n ▂▂▃▃▄▅▆▇███████████▇▆▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂\n 77.7 ns Histogram: frequency by time 137 ns\n\n Memory estimate: 160 bytes, allocs estimate: 1.","category":"page"},{"location":"","page":"Home","title":"Home","text":"For quick sanity checks, one can use the @btime macro, which is a convenience wrapper around @benchmark whose output is analogous to Julia's built-in @time macro:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> @btime sin(x) setup=(x=rand())\n 4.361 ns (0 allocations: 0 bytes)\n0.49587200950472454","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you're interested in profiling a fast-running command, you can use @bprofile sin(x) setup=(x=rand()) and then your favorite tools for displaying the results (Profile.print or a graphical viewer).","category":"page"},{"location":"","page":"Home","title":"Home","text":"If the expression you want to benchmark depends on external variables, you should use $ to \"interpolate\" them into the benchmark expression to avoid the problems of benchmarking with globals. Essentially, any interpolated variable $x or expression $(...) is \"pre-computed\" before benchmarking begins:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> A = rand(3,3);\n\njulia> @btime inv($A); # we interpolate the global variable A with $A\n 1.191 μs (10 allocations: 2.31 KiB)\n\njulia> @btime inv($(rand(3,3))); # interpolation: the rand(3,3) call occurs before benchmarking\n 1.192 μs (10 allocations: 2.31 KiB)\n\njulia> @btime inv(rand(3,3)); # the rand(3,3) call is included in the benchmark time\n 1.295 μs (11 allocations: 2.47 KiB)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Sometimes, interpolating variables into very simple expressions can give the compiler more information than you intended, causing it to \"cheat\" the benchmark by hoisting the calculation out of the benchmark code","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> a = 1; b = 2\n2\n\njulia> @btime $a + $b\n 0.024 ns (0 allocations: 0 bytes)\n3","category":"page"},{"location":"","page":"Home","title":"Home","text":"As a rule of thumb, if a benchmark reports that it took less than a nanosecond to perform, this hoisting probably occurred. You can avoid this by referencing and dereferencing the interpolated variables ","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> @btime $(Ref(a))[] + $(Ref(b))[]\n 1.277 ns (0 allocations: 0 bytes)\n3","category":"page"},{"location":"","page":"Home","title":"Home","text":"As described in the Manual, the BenchmarkTools package supports many other features, both for additional output and for more fine-grained control over the benchmarking process.","category":"page"},{"location":"internals/#Internals","page":"Internals","title":"Internals","text":"","category":"section"},{"location":"internals/","page":"Internals","title":"Internals","text":"Modules = [BenchmarkTools]\nPublic = false\nFilter = f -> f !== Base.run","category":"page"},{"location":"internals/#Base.isempty-Tuple{BenchmarkGroup}","page":"Internals","title":"Base.isempty","text":"isempty(group::BenchmarkGroup)\n\nReturn true if group is empty. This will first run clear_empty! on group to recursively remove any empty subgroups.\n\n\n\n\n\n","category":"method"},{"location":"internals/#BenchmarkTools._withprogress-Tuple{Any, AbstractString, BenchmarkGroup}","page":"Internals","title":"BenchmarkTools._withprogress","text":"_withprogress(\n name::AbstractString,\n group::BenchmarkGroup;\n kwargs...,\n) do progressid, nleaves, ndone\n ...\nend\n\nExecute do block with following arguments:\n\nprogressid: logging ID to be used for @logmsg.\nnleaves: total number of benchmarks counted at the root benchmark group.\nndone: number of completed benchmarks\n\nThey are either extracted from kwargs (for sub-groups) or newly created (for root benchmark group).\n\n\n\n\n\n","category":"method"},{"location":"internals/#BenchmarkTools.load-Tuple{AbstractString, Vararg{Any}}","page":"Internals","title":"BenchmarkTools.load","text":"BenchmarkTools.load(filename)\n\nLoad serialized benchmarking objects (e.g. results or parameters) from a JSON file.\n\n\n\n\n\n","category":"method"},{"location":"internals/#BenchmarkTools.quasiquote!-Tuple{Any, Vararg{Any}}","page":"Internals","title":"BenchmarkTools.quasiquote!","text":"quasiquote!(expr::Expr, vars::Vector{Symbol}, vals::Vector{Expr})\n\nReplace every interpolated value in expr with a placeholder variable and store the resulting variable / value pairings in vars and vals.\n\n\n\n\n\n","category":"method"},{"location":"internals/#BenchmarkTools.save-Tuple{AbstractString, Vararg{Any}}","page":"Internals","title":"BenchmarkTools.save","text":"BenchmarkTools.save(filename, args...)\n\nSave serialized benchmarking objects (e.g. results or parameters) to a JSON file.\n\n\n\n\n\n","category":"method"},{"location":"linuxtips/#Reproducible-benchmarking-in-Linux-based-environments","page":"Linux-based environments","title":"Reproducible benchmarking in Linux-based environments","text":"","category":"section"},{"location":"linuxtips/#Introduction","page":"Linux-based environments","title":"Introduction","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"This document is all about identifying and avoiding potential reproducibility pitfalls when executing performance tests in a Linux-based environment.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"When I started working on performance regression testing for the Julia language, I was surprised that I couldn't find an up-to-date and noob-friendly checklist that succinctly consolidated the performance wisdom scattered across various forums and papers. My hope is that this document provides a starting point for researchers who are new to performance testing on Linux, and who might be trying to figure out why theoretically identical benchmark trials generate significantly different results.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"To the uninitiated, tracking down and eliminating \"OS jitter\" can sometimes feel more like an art than a science. You'll quickly find that setting up a proper environment for rigorous performance testing requires scouring the internet and academic literature for esoteric references to scheduler quirks and kernel flags. Some of these parameters might drastically affect the outcome of your particular benchmark suite, while others may demand inordinate amounts of experimentation just to prove that they don't affect your benchmarks at all.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"This document's goal is not to improve the performance of your application, help you simulate a realistic production environment, or provide in-depth explanations for various kernel mechanisms. It is currently a bit light on NUMA-specific details, but alas, I don't have access to a NUMA-enabled machine to play with. I'm sure that knowledgable readers will find opportunities for corrections and additions, in which case I'd be grateful if you filed an issue or opened a pull request in this repository.","category":"page"},{"location":"linuxtips/#Processor-shielding-and-process-affinity","page":"Linux-based environments","title":"Processor shielding and process affinity","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Processor shielding is a technique that invokes Linux's cpuset pseudo-filesystem to set up exclusive processors and memory nodes that are protected from Linux's scheduler. The easiest way to create and utilize a processor shield is with cset, a convenient Python wrapper over the cpuset interface. On Ubuntu, cset can be installed by running the following:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo apt-get install cpuset","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"It's worth reading the extensive cset tutorial available on RTwiki. As a short example, here's how one might shield processors 1 and 3 from uninvited threads (including most kernel threads, specified by -k on):","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo cset shield -c 1,3 -k on\ncset: --> activating shielding:\ncset: moving 67 tasks from root into system cpuset...\n[==================================================]%\ncset: kthread shield activated, moving 91 tasks into system cpuset...\n[==================================================]%\ncset: **> 34 tasks are not movable, impossible to move\ncset: \"system\" cpuset of CPUSPEC(0,2) with 124 tasks running\ncset: \"user\" cpuset of CPUSPEC(1,3) with 0 tasks running","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"After setting up a shield, you can execute processes within it via the -e flag (note that arguments to the process must be provided after the -- separator):","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo cset shield -e echo -- \"hello from within the shield\"\ncset: --> last message, executed args into cpuset \"/user\", new pid is: 27782\nhello from within the shield\n➜ sudo cset shield -e julia -- benchmark.jl\ncset: --> last message, executed args into cpuset \"/user\", new pid is: 27792\nrunning benchmarks...","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"For slightly lower-level control, you can use cset's other subcommands, proc and set. The actual cpuset kernel interface offers even more options, notably memory hardwalling and scheduling settings.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"To maximize consistency between trials, you should make sure that individual threads executed within the shield always use the exact same processor/memory node configuration. This can be accomplished using hierarchical cpusets to pin processes to child cpusets created under the shielded cpuset. Other utilities for managing process affinity, like taskset, numactl, or tuna, aren't as useful as cset because they don't protect dedicated resources from the scheduler.","category":"page"},{"location":"linuxtips/#Virtual-memory-settings","page":"Linux-based environments","title":"Virtual memory settings","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"The official Linux documentation lists a plethora of virtual memory settings for configuring Linux's swapping, paging, and caching behavior. I encourage the reader to independently investigate the vm.nr_hugepages, vm.vfs_cache_pressure, vm.zone_reclaim_mode, and vm.min_free_kbytes properties, but won't discuss these in-depth because they are not likely to have a large impact in the majority of cases. Instead, I'll focus on two properties which are easier to experiment with and a bit less subtle in their effects: swappiness and address space layout randomization.","category":"page"},{"location":"linuxtips/#Swappiness","page":"Linux-based environments","title":"Swappiness","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Most Linux distributions are configured to swap aggressively by default, which can heavily skew performance results by increasing the likelihood of swapping during benchmark execution. Luckily, it's easy to tame the kernel's propensity to swap by lowering the swappiness setting, controlled via the vm.swappiness parameter:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo sysctl vm.swappiness=10","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"In my experience, lowering vm.swappiness to around 10 or so is sufficient to overcome swap-related noise on most memory-bound benchmarks.","category":"page"},{"location":"linuxtips/#Address-space-layout-randomization-(ASLR)","page":"Linux-based environments","title":"Address space layout randomization (ASLR)","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Address space layout randomization (ASLR) is a security feature that makes it harder for malicious programs to exploit buffer overflows. In theory, ASLR could significantly impact reproducibility for benchmarks that are highly susceptible to variations in memory layout. Disabling ASLR should be done at your own risk - it is a security feature, after all.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"ASLR can be disabled globally by setting randomize_va_space to 0:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo sysctl kernel.randomize_va_space=0","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"If you don't wish to disable ASLR globally, you can simply start up an ASLR-disabled shell by running:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ setarch $(uname -m) -R /bin/sh","category":"page"},{"location":"linuxtips/#CPU-frequency-scaling-and-boosting","page":"Linux-based environments","title":"CPU frequency scaling and boosting","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Most modern CPUs support dynamic frequency scaling, which is the ability to adjust their clock rate in order to manage power usage and temperature. On Linux, frequency scaling behavior is determined by heuristics dubbed \"governors\", each of which prioritizes different patterns of resource utilization. This feature can interfere with performance results if rescaling occurs during benchmarking or between trials, but luckily we can keep the effective clock rate static by enabling the performance governor on all processors:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ echo \"performance\" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"You can check that this command worked by making sure that cat /proc/cpuinfo | grep 'cpu MHz' spits out the same values as cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Many CPUs also support discretionary performance \"boosting\", which is similar to dynamic frequency scaling and can have the same negative impacts on benchmark reproducibility. To disable CPU boosting, you can run the following:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost","category":"page"},{"location":"linuxtips/#Hyperthreading","page":"Linux-based environments","title":"Hyperthreading","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Hyperthreading, more generally known as simultaneous multithreading (SMT), allows multiple software threads to \"simultaneously\" run on \"independent\" hardware threads on a single CPU core. The downside is that these threads can't always actually execute concurrently in practice, as they contend for shared CPU resources. Frustratingly, Linux exposes these threads to the operating system as extra logical processors, making techniques like shielding difficult to reason about - how do you know that your shielded \"processor\" isn't actually sharing a physical core with an unshielded \"processor\"? Unless your use case demands that you run tests in a hyperthreaded environment, you should consider disabling hyperthreading to make it easier to manage processor resources consistently.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"The first step to disabling hyperthreading is to check whether it's actually enabled on your machine. To do so, you can use lscpu:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ lscpu\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nByte Order: Little Endian\nCPU(s): 8 \nOn-line CPU(s) list: 0-7\nThread(s) per core: 2 \nCore(s) per socket: 4 \nSocket(s): 1\nNUMA node(s): 1\nVendor ID: GenuineIntel\nCPU family: 6\nModel: 60\nStepping: 3\nCPU MHz: 3501.000\nBogoMIPS: 6999.40\nVirtualization: VT-x\nL1d cache: 32K\nL1i cache: 32K\nL2 cache: 256K\nL3 cache: 8192K\nNUMA node0 CPU(s): 0-7","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"In the above output, the CPU(s) field tells us there are 8 logical processors. The other fields allow us to do a more granular breakdown: 1 socket times 4 cores per socket gives us 4 physical cores, times 2 threads per core gives us 8 logical processors. Since there are more logical processors than physical cores, we know hyperthreading is enabled.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Before we start disabling processors, we need to know which ones share a physical core:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list\n0,4\n1,5\n2,6\n3,7\n0,4\n1,5\n2,6\n3,7","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Each row above is in the format i,j, and can be read logical processor i shares a physical core with logical processor j. We can disable hyperthreading by taking excess sibling processors offline, leaving only one logical processor per physical core. In our example, we can accomplish this by disabling processors 4, 5, 6, and 7:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ echo 0 | sudo tee /sys/devices/system/cpu/cpu4/online\n0\n➜ echo 0 | sudo tee /sys/devices/system/cpu/cpu5/online\n0\n➜ echo 0 | sudo tee /sys/devices/system/cpu/cpu6/online\n0\n➜ echo 0 | sudo tee /sys/devices/system/cpu/cpu7/online\n0","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Now, we can verify that hyperthreading is disabled by checking each processor's thread_siblings_list again:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list\n0\n1\n2\n3","category":"page"},{"location":"linuxtips/#Interrupt-requests-and-SMP-affinity","page":"Linux-based environments","title":"Interrupt requests and SMP affinity","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"The kernel will periodically send interrupt requests (IRQs) to your processors. As the name implies, IRQs ask a processor to pause the currently running task in order to perform the requested task. There are many different kinds of IRQs, and the degree to which a specific kind of IRQ interferes with a given benchmark depends on the frequency and duration of the IRQ compared to the benchmark's workload.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"The good news is that most kinds of IRQs allow you to set an SMP affinity, which tells the kernel which processor an IRQ should be sent to. By properly configuring SMP affinities, we can send IRQs to the unshielded processors in our benchmarking environment, thus protecting the shielded processors from undesirable interruptions.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"You can use Linux's proc pseudo-filesystem to get a list of interrupts that have occurred on your system since your last reboot:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ cat /proc/interrupts\n CPU0 CPU1\n 0: 19 0 IR-IO-APIC-edge timer\n 8: 1 0 IR-IO-APIC-edge rtc0\n 9: 0 0 IR-IO-APIC-fasteoi acpi\n 16: 27 0 IR-IO-APIC-fasteoi ehci_hcd:usb1\n 22: 12 0 IR-IO-APIC-fasteoi ehci_hcd:usb2\n ⋮\n 53: 18021763 122330 IR-PCI-MSI-edge eth0-TxRx-7\nNMI: 15661 13628 Non-maskable interrupts\nLOC: 140221744 85225898 Local timer interrupts\nSPU: 0 0 Spurious interrupts\nPMI: 15661 13628 Performance monitoring interrupts\nIWI: 23570041 3729274 IRQ work interrupts\nRTR: 7 0 APIC ICR read retries\nRES: 3153272 4187108 Rescheduling interrupts\nCAL: 3401 10460 Function call interrupts\nTLB: 4434976 3071723 TLB shootdowns\nTRM: 0 0 Thermal event interrupts\nTHR: 0 0 Threshold APIC interrupts\nMCE: 0 0 Machine check exceptions\nMCP: 61112 61112 Machine check polls\nERR: 0\nMIS: 0","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Some interrupts, like non-maskable interrupts (NMI), can't be redirected, but you can change the SMP affinities of the rest by writing processor indices to /proc/irq/n/smp_affinity_list, where n is the IRQ number. Here's an example that sets IRQ 22's SMP affinity to processors 0, 1, and 2:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ echo 0-2 | sudo tee /proc/irq/22/smp_affinity_list","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"The optimal way to configure SMP affinities depends a lot on your benchmarks and benchmarking process. For example, if you're running a lot of network-bound benchmarks, it can sometimes be more beneficial to evenly balance ethernet driver interrupts (usually named something like eth0-*) than to restrict them to specific processors.","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"A smoke test for determining the impact of IRQs on benchmark results is to see what happens when you turn on/off an IRQ load balancer like irqbalance. If this has a noticeable effect on your results, it might be worth playing around with SMP affinities to figure out which IRQs should be directed away from your shielded processors.","category":"page"},{"location":"linuxtips/#Performance-monitoring-interrupts-(PMIs)-and-perf","page":"Linux-based environments","title":"Performance monitoring interrupts (PMIs) and perf","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"Performance monitoring interrupts (PMIs) are sent by the kernel's perf subsystem, which is used to set and manage hardware performance counters monitored by other parts of the kernel. Unless perf is a dependency of your benchmarking process, it may be useful to lower perf's sample rate so that PMIs don't interfere with your experiments. One way to do this is to set the kernel.perf_cpu_time_max_percent parameter to 1:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"➜ sudo sysctl kernel.perf_cpu_time_max_percent=1","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"This tells the kernel to inform perf that it should lower its sample rate such that sampling consumes less than 1% of CPU time. After changing this parameter, you may see messages in the system log like:","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"[ 3835.065463] perf samples too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate","category":"page"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"These messages are nothing to be concerned about - it's simply the kernel reporting that it's lowering perf's max sample rate in order to respect the perf_cpu_time_max_percent property we just set.","category":"page"},{"location":"linuxtips/#Additional-resources","page":"Linux-based environments","title":"Additional resources","text":"","category":"section"},{"location":"linuxtips/","page":"Linux-based environments","title":"Linux-based environments","text":"While not highly navigable and a bit overwhelming for newcomers, the most authoritative resource for kernel information is the official Linux documentation hosted at the Linux Kernel Archives.\nAkkan et al.'s 2012 paper on developing a noiseless Linux environment explores the optimal configurations for isolating resources from timer interrupts and the scheduler, as well as the benefits of tickless kernels. The paper makes use of Linux's cgroups, which are similar to the cpusets discussed in this document.\nDe et al.'s 2009 paper on reducing OS jitter in multithreaded systems is similar to Akkan et al.'s paper, but focuses on minimizing jitter for applications that make use of hyperthreading/SMT. Their experimental approach is different as well, relying heavily on analysis of simulated jitter \"traces\" attained by clever benchmarking.\nFor a solid overview of the Linux performance testing ecosystem, check out Brendan Gregg's talk on Linux performance tools. Note that this talk is more focused on debugging system performance problems as they arise in a large distributed environment, rather than application benchmarking or experimental reproducibility.\nThe RHEL6 Performance Tuning Guide is useful for introducing yourself to various kernel constructs that can cause performance problems. You can also check out the RHEL7 version of the same guide if you want something more recent, but I find the RHEL6 version more readable.","category":"page"}] }