You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Accurate DFT benchmarking requires careful control of optimizations, CPU architecture, and compiler behavior. Follow these guidelines to ensure reliable performance measurements.
4
+
5
+
> [!note]
6
+
> A robust FFT benchmark suite implementing all these techniques is published at https://github.com/kfrlib/fft-benchmark
7
+
8
+
- Ensure that the optimized version of each library is used. If the vendor provides prebuilt binaries, use them.
9
+
- For KFR, the official binaries can be found at: https://github.com/kfrlib/kfr/releases.
10
+
- To verify that KFR is optimized for maximum performance, call:
The output must include the `optimized` flag and must not contain the `debug` flag.
20
+
21
+
- For libraries that support dynamic CPU dispatch, ensure that the best available architecture for your CPU is selected at runtime. Refer to the library documentation to learn how to verify this.
22
+
- For KFR, call:
23
+
24
+
```c++
25
+
cpu_runtime()
26
+
```
27
+
This function returns the selected architecture, such as `avx2`, `avx512`, or `neon`/`neon64` (for ARM).
28
+
29
+
- Ensure that no emulation is involved. For example, use native `arm64` binaries for Apple M-series CPUs.
30
+
31
+
- Exclude plan creation from the time measurements.
32
+
33
+
- Ensure that the compiler does not optimize out the DFT code. Add code that utilizes the output data in some way to prevent the compiler from optimizing away the computation.
34
+
35
+
- Perform multiple invocations to obtain reliable results. A few seconds of execution time is the minimum requirement for accurate measurements.
36
+
37
+
- Use the median or minimum of all measured execution times rather than the simple mean, as this better protects against unexpected spikes and benchmarking noise.
## How to apply a Sample Rate Conversion to a contiguous signal?
3
4
4
-
[See also a gallery with results of applying various SRC presets](src_gallery.md)
5
+
For a continuous signal, the same instance of the `samplerate_converter` class should be used across all subsequent calls, rather than creating a new instance for each fragment. In the case of stereo audio, two instances (one per channel) are required.
6
+
7
+
The `samplerate_converter` class supports both `push` and `pull` methods for handling data flow.
8
+
9
+
-**`push`**: Input data of a fixed size is provided, and all available output data is received.
10
+
**Example**: Processing audio from a microphone, where the sound device sends data in fixed-size chunks.
11
+
12
+
-**`pull`**: An output buffer of a fixed size is provided, and the necessary amount of input data is processed to generate the required output.
13
+
**Example**: Streaming audio at a different sample rate to a sound device, where a specific number of output samples must be generated to fill the device buffer.
14
+
15
+
Let’s consider the case of resampling 44.1 kHz to 96 kHz with an output buffer of 512 samples (`pull`).
16
+
The corresponding input size should be 235.2, which is not an integer.
17
+
18
+
The `samplerate_converter` class processes signals split into buffers of different sizes by maintaining an internal state.
19
+
20
+
To determine the required input buffer size for the next call to `process`, `input_size_for_output` can be used by passing the desired output buffer length. This will return either 236 or 235 samples in the 44.1khz to 96khz scenario.
21
+
22
+
The `process` function accepts two parameters:
23
+
-`output`: Output buffer, provided as a univector with the desired size (512).
24
+
-`input`: Input buffer, provided as a univector of at least the size returned by `input_size_for_output`. The resampler consumes the necessary number of samples to generate 512 output samples and returns the number of input samples read. The input should be adjusted accordingly to skip these samples.
25
+
26
+
For the `push` method, call `output_size_for_input` with the size of your input buffer. This function returns the corresponding output buffer size required to receive all pending output.
27
+
28
+
### Example (pull)
29
+
30
+
```c++
31
+
// Initialization
32
+
auto src = samplerate_converter<double>(sample_rate_conversion_quality::high, output_samplerate, input_samplerate);
0 commit comments