Skip to content

Commit

Permalink
Merge pull request #806 from pq-code-package/barret_magic
Browse files Browse the repository at this point in the history
Poly: Hardcode barrett multiplier
  • Loading branch information
mkannwischer authored Feb 25, 2025
2 parents e319849 + 4450d88 commit 50beac5
Showing 1 changed file with 13 additions and 11 deletions.
24 changes: 13 additions & 11 deletions mlkem/poly.c
Original file line number Diff line number Diff line change
Expand Up @@ -79,21 +79,23 @@ __contract__(
ensures(return_value > -MLKEM_Q_HALF && return_value < MLKEM_Q_HALF)
)
{
/*
* To divide by MLKEM_Q using Barrett multiplication, the "magic number"
* multiplier is round_to_nearest(2**26/MLKEM_Q)
/* Barrett reduction approximates
* ```
* round(a/MLKEM_Q)
* = round(a*(2^N/MLKEM_Q))/2^N)
* ~= round(a*round(2^N/MLKEM_Q)/2^N)
* ```
* Here, we pick N=26.
*/
const int BPOWER = 26;
const int32_t barrett_multiplier = ((1 << BPOWER) + MLKEM_Q / 2) / MLKEM_Q;
const int32_t magic = 20159; /* check-magic: 20159 == round(2^26 / MLKEM_Q) */

/*
* Compute round_to_nearest(a/MLKEM_Q) using the multiplier
* above and shift by BPOWER places.
* PORTABILITY: Right-shift on a signed integer is, strictly-speaking,
* implementation-defined for negative left argument. Here,
* we assume it's sign-preserving "arithmetic" shift right. (C99 6.5.7 (5))
* PORTABILITY: Right-shift on a signed integer is
* implementation-defined for negative left argument.
* Here, we assume it's sign-preserving "arithmetic" shift right.
* See (C99 6.5.7 (5))
*/
const int32_t t = (barrett_multiplier * a + (1 << (BPOWER - 1))) >> BPOWER;
const int32_t t = (magic * a + (1 << 25)) >> 26;

/*
* t is in -10 .. +10, so we need 32-bit math to
Expand Down

18 comments on commit 50beac5

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 9634 cycles 9637 cycles 1.00
ML-KEM-512 encaps 11231 cycles 11239 cycles 1.00
ML-KEM-512 decaps 15304 cycles 15319 cycles 1.00
ML-KEM-768 keypair 16437 cycles 16362 cycles 1.00
ML-KEM-768 encaps 17987 cycles 17906 cycles 1.00
ML-KEM-768 decaps 23771 cycles 23682 cycles 1.00
ML-KEM-1024 keypair 22253 cycles 22209 cycles 1.00
ML-KEM-1024 encaps 24067 cycles 24058 cycles 1.00
ML-KEM-1024 decaps 31923 cycles 31886 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 29311 cycles 29308 cycles 1.00
ML-KEM-512 encaps 34251 cycles 34168 cycles 1.00
ML-KEM-512 decaps 44426 cycles 44426 cycles 1
ML-KEM-768 keypair 47944 cycles 47920 cycles 1.00
ML-KEM-768 encaps 56239 cycles 56174 cycles 1.00
ML-KEM-768 decaps 67992 cycles 67933 cycles 1.00
ML-KEM-1024 keypair 72091 cycles 72077 cycles 1.00
ML-KEM-1024 encaps 84763 cycles 84744 cycles 1.00
ML-KEM-1024 decaps 101696 cycles 101586 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 29537 cycles 29536 cycles 1.00
ML-KEM-512 encaps 35129 cycles 35129 cycles 1
ML-KEM-512 decaps 45749 cycles 45749 cycles 1
ML-KEM-768 keypair 50468 cycles 50467 cycles 1.00
ML-KEM-768 encaps 55831 cycles 55835 cycles 1.00
ML-KEM-768 decaps 70803 cycles 70808 cycles 1.00
ML-KEM-1024 keypair 73367 cycles 73374 cycles 1.00
ML-KEM-1024 encaps 82310 cycles 82313 cycles 1.00
ML-KEM-1024 decaps 102577 cycles 102575 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 11651 cycles 11665 cycles 1.00
ML-KEM-512 encaps 13310 cycles 13345 cycles 1.00
ML-KEM-512 decaps 18205 cycles 18182 cycles 1.00
ML-KEM-768 keypair 20143 cycles 20143 cycles 1
ML-KEM-768 encaps 21215 cycles 21195 cycles 1.00
ML-KEM-768 decaps 28436 cycles 28417 cycles 1.00
ML-KEM-1024 keypair 26963 cycles 26982 cycles 1.00
ML-KEM-1024 encaps 29015 cycles 29040 cycles 1.00
ML-KEM-1024 decaps 38516 cycles 38576 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 17265 cycles 17260 cycles 1.00
ML-KEM-512 encaps 19192 cycles 19046 cycles 1.01
ML-KEM-512 decaps 24569 cycles 24603 cycles 1.00
ML-KEM-768 keypair 29666 cycles 29384 cycles 1.01
ML-KEM-768 encaps 30939 cycles 30621 cycles 1.01
ML-KEM-768 decaps 39133 cycles 38570 cycles 1.01
ML-KEM-1024 keypair 43715 cycles 43674 cycles 1.00
ML-KEM-1024 encaps 45100 cycles 45122 cycles 1.00
ML-KEM-1024 decaps 55595 cycles 55599 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 16166 cycles 16167 cycles 1.00
ML-KEM-512 encaps 18398 cycles 18385 cycles 1.00
ML-KEM-512 decaps 24950 cycles 24949 cycles 1.00
ML-KEM-768 keypair 27913 cycles 27810 cycles 1.00
ML-KEM-768 encaps 29608 cycles 29525 cycles 1.00
ML-KEM-768 decaps 38960 cycles 38975 cycles 1.00
ML-KEM-1024 keypair 38513 cycles 37730 cycles 1.02
ML-KEM-1024 encaps 40768 cycles 40730 cycles 1.00
ML-KEM-1024 decaps 53259 cycles 53291 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 18050 cycles 18050 cycles 1
ML-KEM-512 encaps 21418 cycles 21417 cycles 1.00
ML-KEM-512 decaps 28124 cycles 28120 cycles 1.00
ML-KEM-768 keypair 31074 cycles 31073 cycles 1.00
ML-KEM-768 encaps 34152 cycles 34156 cycles 1.00
ML-KEM-768 decaps 43783 cycles 43785 cycles 1.00
ML-KEM-1024 keypair 44910 cycles 44906 cycles 1.00
ML-KEM-1024 encaps 50329 cycles 50336 cycles 1.00
ML-KEM-1024 decaps 63267 cycles 63271 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 36390 cycles 36405 cycles 1.00
ML-KEM-512 encaps 42933 cycles 42958 cycles 1.00
ML-KEM-512 decaps 55856 cycles 55876 cycles 1.00
ML-KEM-768 keypair 59064 cycles 59065 cycles 1.00
ML-KEM-768 encaps 67576 cycles 67811 cycles 1.00
ML-KEM-768 decaps 84510 cycles 84528 cycles 1.00
ML-KEM-1024 keypair 87392 cycles 87401 cycles 1.00
ML-KEM-1024 encaps 98321 cycles 98318 cycles 1.00
ML-KEM-1024 decaps 119620 cycles 119655 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 39819 cycles 39913 cycles 1.00
ML-KEM-512 encaps 48167 cycles 48159 cycles 1.00
ML-KEM-512 decaps 62398 cycles 62421 cycles 1.00
ML-KEM-768 keypair 64756 cycles 64641 cycles 1.00
ML-KEM-768 encaps 75800 cycles 75801 cycles 1.00
ML-KEM-768 decaps 94482 cycles 94455 cycles 1.00
ML-KEM-1024 keypair 95934 cycles 96080 cycles 1.00
ML-KEM-1024 encaps 109397 cycles 109578 cycles 1.00
ML-KEM-1024 decaps 132969 cycles 133163 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 29569 cycles 29540 cycles 1.00
ML-KEM-512 encaps 35065 cycles 35130 cycles 1.00
ML-KEM-512 decaps 45784 cycles 45737 cycles 1.00
ML-KEM-768 keypair 50417 cycles 50473 cycles 1.00
ML-KEM-768 encaps 55932 cycles 55828 cycles 1.00
ML-KEM-768 decaps 70859 cycles 70820 cycles 1.00
ML-KEM-1024 keypair 73382 cycles 73382 cycles 1
ML-KEM-1024 encaps 82321 cycles 82322 cycles 1.00
ML-KEM-1024 decaps 102596 cycles 102594 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 19156 cycles 19154 cycles 1.00
ML-KEM-512 encaps 22937 cycles 22936 cycles 1.00
ML-KEM-512 decaps 30239 cycles 30236 cycles 1.00
ML-KEM-768 keypair 32832 cycles 32830 cycles 1.00
ML-KEM-768 encaps 36514 cycles 36513 cycles 1.00
ML-KEM-768 decaps 46950 cycles 46950 cycles 1
ML-KEM-1024 keypair 47385 cycles 47388 cycles 1.00
ML-KEM-1024 encaps 53362 cycles 53362 cycles 1
ML-KEM-1024 decaps 67315 cycles 67325 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 46850 cycles 46809 cycles 1.00
ML-KEM-512 encaps 55434 cycles 55392 cycles 1.00
ML-KEM-512 decaps 71309 cycles 71218 cycles 1.00
ML-KEM-768 keypair 76202 cycles 76403 cycles 1.00
ML-KEM-768 encaps 87519 cycles 87519 cycles 1
ML-KEM-768 decaps 108142 cycles 108215 cycles 1.00
ML-KEM-1024 keypair 112419 cycles 112359 cycles 1.00
ML-KEM-1024 encaps 126474 cycles 126390 cycles 1.00
ML-KEM-1024 decaps 152708 cycles 152685 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 35739 cycles 35743 cycles 1.00
ML-KEM-512 encaps 40725 cycles 40754 cycles 1.00
ML-KEM-512 decaps 52087 cycles 52087 cycles 1
ML-KEM-768 keypair 60568 cycles 63194 cycles 0.96
ML-KEM-768 encaps 67490 cycles 67449 cycles 1.00
ML-KEM-768 decaps 81160 cycles 81164 cycles 1.00
ML-KEM-1024 keypair 88817 cycles 88815 cycles 1.00
ML-KEM-1024 encaps 98809 cycles 98809 cycles 1
ML-KEM-1024 decaps 117445 cycles 117446 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 39112 cycles 39107 cycles 1.00
ML-KEM-512 encaps 44880 cycles 44879 cycles 1.00
ML-KEM-512 decaps 56751 cycles 56747 cycles 1.00
ML-KEM-768 keypair 64458 cycles 64455 cycles 1.00
ML-KEM-768 encaps 72632 cycles 72601 cycles 1.00
ML-KEM-768 decaps 87895 cycles 87857 cycles 1.00
ML-KEM-1024 keypair 96097 cycles 96102 cycles 1.00
ML-KEM-1024 encaps 106159 cycles 106150 cycles 1.00
ML-KEM-1024 decaps 127059 cycles 127055 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 59718 cycles 59620 cycles 1.00
ML-KEM-512 encaps 68346 cycles 68259 cycles 1.00
ML-KEM-512 decaps 87157 cycles 87001 cycles 1.00
ML-KEM-768 keypair 99068 cycles 99327 cycles 1.00
ML-KEM-768 encaps 110818 cycles 110875 cycles 1.00
ML-KEM-768 decaps 135230 cycles 135152 cycles 1.00
ML-KEM-1024 keypair 149117 cycles 149104 cycles 1.00
ML-KEM-1024 encaps 164613 cycles 164582 cycles 1.00
ML-KEM-1024 decaps 195986 cycles 195928 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 59493 cycles 59582 cycles 1.00
ML-KEM-512 encaps 67135 cycles 67220 cycles 1.00
ML-KEM-512 decaps 86472 cycles 86526 cycles 1.00
ML-KEM-768 keypair 101269 cycles 101475 cycles 1.00
ML-KEM-768 encaps 112419 cycles 112559 cycles 1.00
ML-KEM-768 decaps 139318 cycles 139523 cycles 1.00
ML-KEM-1024 keypair 153571 cycles 153597 cycles 1.00
ML-KEM-1024 encaps 170749 cycles 171003 cycles 1.00
ML-KEM-1024 decaps 206829 cycles 208052 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 225723 cycles 225723 cycles 1
ML-KEM-512 encaps 272700 cycles 272677 cycles 1.00
ML-KEM-512 decaps 347779 cycles 347782 cycles 1.00
ML-KEM-768 keypair 373204 cycles 373216 cycles 1.00
ML-KEM-768 encaps 435174 cycles 435336 cycles 1.00
ML-KEM-768 decaps 533524 cycles 533660 cycles 1.00
ML-KEM-1024 keypair 555056 cycles 555093 cycles 1.00
ML-KEM-1024 encaps 635975 cycles 636049 cycles 1.00
ML-KEM-1024 decaps 758176 cycles 758785 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Benchmark suite Current: 50beac5 Previous: e319849 Ratio
ML-KEM-512 keypair 53117 cycles 53095 cycles 1.00
ML-KEM-512 encaps 61858 cycles 61097 cycles 1.01
ML-KEM-512 decaps 78640 cycles 78648 cycles 1.00
ML-KEM-768 keypair 91091 cycles 90072 cycles 1.01
ML-KEM-768 encaps 99064 cycles 98254 cycles 1.01
ML-KEM-768 decaps 122706 cycles 122610 cycles 1.00
ML-KEM-1024 keypair 135563 cycles 134991 cycles 1.00
ML-KEM-1024 encaps 149163 cycles 148668 cycles 1.00
ML-KEM-1024 decaps 181334 cycles 181324 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.