The difference in performance primarily stems from hardware-level architectural differences between Apple silicon (M-series) and most Windows-based systems.
The key factor is CPU cache size and memory access speed, not operating system or plugin code.
What’s the underlying technical cause?
The Relab 176 needs approximately 47 KB of working memory per stereo 512-sample buffer.
This working memory includes DSP instructions, lookup tables, constants, internal states, and audio buffers necessary to model a complex, fully nonlinear analog circuit in real time.
For optimal performance, this data needs to fit within the processor’s L1 data cache – the fastest level of CPU memory.
- Apple M-series processors (e.g., M1, M2, M4) have 64 KB L1 caches per performance core, large enough to hold all working data.
- Most Intel/AMD processors on Windows have 32–48 KB L1 caches, meaning a portion of the plugin’s data spills over into slower memory (L2/L3 cache or RAM), reducing performance.
Why does L1 cache size matter for audio processing?
When all processing data fits in the L1 cache, access times are typically 1–2 clock cycles. Once data spills into L2, L3, or RAM, access latency increases by a factor of 10 to 100+, depending on the architecture.
In time-sensitive audio workloads, these delays accumulate and can impact performance in real time.
Are there other relevant hardware differences?
Yes. Apple’s M-series uses a system-on-chip (SoC) design where memory is physically closer to the CPU. This reduces latency when accessing data outside the L1 cache compared to traditional multi-chip systems.
The benefit is most apparent in DSP-heavy workloads like the 176.
Is this a software limitation or lack of optimization?
Not necessarily. The plugin is already heavily optimized across both platforms.
Key optimizations include:
- Use of Eigen, a performance-oriented math library
- Custom-built numerical solvers beyond standard libraries
- Cache-aware data structures and memory layout
- Sequential memory access to minimize cache misses
We’re actively working to reduce the memory footprint, our current target is ~32 KB per buffer, but due to the complexity of the model, it’s not guaranteed this will be achievable.
The current size reflects what’s needed to maintain fidelity to the hardware’s behavior at this resolution and responsiveness.
Does this apply to other Relab plugins?
Not to the same extent. The Relab 176’s modeling depth and real-time nonlinear circuit emulation make it more sensitive to cache performance than typical effect plugins.
Other Relab products generally have smaller processing footprints and may show little to no difference across platforms.