Skip to content

Commit 12d778c

Browse files
committed
Updated benchmark results
Signed-off-by: Andrea Zoppi <texzk@email.it>
1 parent 32761bb commit 12d778c

12 files changed

+32
-46
lines changed

README.md

Lines changed: 32 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -108,37 +108,56 @@ A basic benchmark suite is run via the following commands:
108108
```sh
109109
cd PATH_TO_PROJECT_ROOT/builddir
110110
meson test --benchmark
111-
meson compile benchmark-report
111+
meson compile benchmark-report-tda8425
112+
meson compile benchmark-report-ym7128
113+
meson compile benchmark-report-ymf262
112114
```
113115

114116

115-
### OPL3 Benchmark Results
117+
### Benchmark Results
116118

117119
Some preliminary benchmarks were run against some very different CPUs:
118120

119121
| System | OS | CPU | SIMD | Notes
120122
|:-|:-|:-|:-|:-|
121-
| PC | Windows 10 | i7 6700k | x86 SSE4.1 + AVX2 | Home PC |
123+
| PC | Windows 10 | i7 6700k | x86 SSE4.1 + AVX2 | 2016 gaming PC |
122124
| BeagleBone Black | Debian 11 | ARM Cortex-A8 | ARMv7 NEON | Headless |
123125
| Raspberry Pi 5 | Debian 12 | ARM Cortex-A76 | ARMv7 NEON | Headless + Heatsink Fan |
124126

125127
All the systems were updated to their latest software and OS releases.
126128
The compiler was *GCC* for all these machines.
127-
All the scores were played via `aymo_ymf262_play --benchmark --loops 3`, except for the *BBB* which did not loop (too slow!).
128129

129130
All the systems run `--cpu-ext dummy`, which mimics the overhead of the test harness itself (mostly the score decoder), to subtract it from the actual benchmarks.
130-
The reference implementation is *NukedOPL3*, run as `--cpu-ext none`.
131131

132-
Here's a summary of the results:
132+
All the benchmarks results are normalized against the plain *C* implementation, run as `--cpu-ext none`.
133133

134-
| CPU | SIMD | Ratio | DevSt | Speedup |
135-
|:-|:-|-:|-:|-:|
136-
| i7 6700k | x86 SSE4.1 | 0.590 | 0.026 | 1.695 |
137-
| i7 6700k | x86 AVX2 | 0.302 | 0.013 | 3.315 |
138-
| ARM Cortex-A8 | ARMv7 NEON | 0.575 | 0.035 | 1.740 |
139-
| ARM Cortex-A76 | ARMv7 NEON | 0.374 | 0.010 | 2.671 |
140134

141-
![Benchmark Results](./doc/benchmarks/benchmark-results.png)
135+
#### TDA8425
136+
137+
A basic *TDA8425* can be emulated with simple DSP techniques (mostly IIR filters), so the implementation can be rather straightforward.
138+
139+
Surprisingly, the *BBB* shows a much higher speedup compared to the other SIMD I tested.
140+
Perhaps the plain C implementation cannot be optimized by the CPU core itself, as done with higher grade CPUs.
141+
This somehow shows the benefits of *AYMO* for older embedded systems.
142+
143+
![Benchmark Results](./doc/benchmarks-tda8425.png)
144+
145+
146+
#### YM7128
147+
148+
The *YM7128* is a simple fixed-point delay unit, with lots of parallel computations.
149+
The results are indeed very interesting for all the SIMD architectures under test, consistently showing some nice speedup.
150+
151+
![Benchmark Results](./doc/benchmarks-ym7128.png)
152+
153+
154+
#### YMF262
155+
156+
The reference *OPL3* implementation is *NukedOPL3*.
157+
158+
All the *OPL3* scores were played via `aymo_ymf262_play --benchmark --loops 3`, except for the *BBB* which did not loop (too slow!).
159+
160+
![YMF262 Benchmark Results](./doc/benchmarks-ymf262.png)
142161

143162

144163
## Integration

doc/benchmarks-tda8425.png

7.99 KB
Loading

doc/benchmarks-ym7128.png

7.82 KB
Loading

doc/benchmarks-ymf262.png

14.7 KB
Loading

doc/benchmarks/BBB_ARM-A8.csv

Lines changed: 0 additions & 11 deletions
This file was deleted.

doc/benchmarks/PC_i7-6700k.csv

Lines changed: 0 additions & 11 deletions
This file was deleted.

doc/benchmarks/RPi5_ARM-A76.csv

Lines changed: 0 additions & 11 deletions
This file was deleted.

doc/benchmarks/bbb-testlog.7z

7.82 KB
Binary file not shown.

doc/benchmarks/benchmark-results.png

-14.7 KB
Binary file not shown.

doc/benchmarks/benchmark-results.xlsx

9.31 KB
Binary file not shown.

doc/benchmarks/pc-testlog.7z

10.6 KB
Binary file not shown.

doc/benchmarks/rpi5-testlog.7z

7.79 KB
Binary file not shown.

0 commit comments

Comments
 (0)