Quad-to-Stereo Decoding
The Concept
Decoding an Ambisonic mix to four virtual speakers in the horizontal plane — front left, front right, rear left, rear right — and summing those to stereo is a robust alternative to full binaural convolution. The approach produces a wider, more stable image than a direct HOA-to-stereo downmix, while avoiding the phase artifacts that binaural HRTF decoding introduces in the low-mids.

Signal flow: HOA to four virtual speakers, with HF reduction on the rear bus before summing to stereo.
HRTFs work with two main principles:
Gains, Phase shifts and inter-aural time differences — the microscopic delays between left and right ear that create the illusion of externalization.
Spectral shaping — a frequency-dependent coloration caused by the pinna reflecting and diffracting sound before it reaches the ear canal. For sounds arriving from behind, this produces a characteristic high-frequency rolloff and a set of notches in the 6–10 kHz range (Blauert, 1997).
The auditory system uses both cues, but the spectral shaping is particularly robust — it works even in mono and is the primary cue for elevation and front/back disambiguation.
By applying a high-frequency rolloff to the rear channels before summing to stereo, the rear path provides a spectral cue consistent with "this sound is behind me," without the phase manipulation that causes metallic artifacts in full binaural decoding.
The quad-to-stereo approach typically sounds:
Less phasey than full binaural — no inter-aural crosstalk cancellation
Wider than a direct stereo downmix — the rear channels push energy to the sides of the stereo field
More robust across different headphones and even loudspeakers — the spectral cue survives playback systems that would destroy phase-based cues
The Setup in Reaper

Signal flow: HOA to four virtual speakers, with HF reduction on the rear bus before summing to stereo.
Virtual speaker placement in IEM AllRADecoder:
Front pair: ±30°–45° azimuth, 0° elevation
Rear pair: ±135° azimuth, 0° elevation
Keep all speakers in the horizontal plane. Elevation in the virtual array introduces height decoding artifacts when summing to stereo.
Processing on the rear bus:
The rear bus is where the character of the spatial impression is shaped. Several types of processing are useful here, and they can be combined:
- Spectral shaping (the core tool)
-
A high-frequency shelf or low-pass is the primary means of creating the front/rear distinction. A gentle shelf starting around 5 kHz preserves some rear-channel air; a steeper low-pass at 3–4 kHz produces a stronger sense of immersion at the cost of high-frequency diffuseness. This mimics the spectral signature of rear-hemisphere HRTFs (Blauert, 1997).
- Level trim
-
Reducing the rear bus by 3–6 dB relative to the front keeps the center image stable and prevents the rear energy from masking the direct sources.
- Decorrelation
-
A small amount of decorrelation between RL and RR (via a short allpass or the SPARTA Decorrelator) widens the perceived rear field and prevents the two rear channels from summing to a narrow mono image. Use sparingly — too much decorrelation sounds unnatural and creates mono compatibility issues.
- Diffusion / early reflections
-
A short, dense reverb tail on the rear bus can increase the sense of envelopment without adding direct-source energy. Keep pre-delay at zero so the rear bus does not arrive before the front.
For spectral shaping, any single-band shelf or low-pass works well:
Platform |
Suitable Plugin |
Notes |
|---|---|---|
Linux |
LSP Parametric EQ, x42-eq |
Per-channel mode for precise rear-only processing |
macOS |
FabFilter Pro-Q 3, built-in Channel EQ |
|
Windows |
FabFilter Pro-Q 3, TDR Nova |
Front/Rear Balance
The level ratio between the front and rear buses determines the perceived depth of the mix.
Too much rear level: the mix feels diffuse and lacks a stable center image.
Too little rear level: the spatial width collapses and the result sounds like a conventional stereo downmix.
A starting point is to set the rear bus 3–6 dB below the front bus and adjust by ear. In a dense electroacoustic mix the rears primarily carry reverb tails, spatial diffuseness, and low-mid weight — not direct sources — so they can often sit lower than expected while still contributing a clear sense of space.
Limitations
This technique is a perceptual approximation, not a geometrically accurate spatial reproduction. Its main limitations:
No externalization — without phase-based HRTF cues, the sound remains inside the head to some degree. The spectral rear cue creates the impression of depth and width rather than true externalization.
Front/back collapse on loudspeakers — the spectral cue is weaker on speakers than on headphones, so the front/rear distinction may be less clear outside a headphone context.
Fixed virtual geometry — the four speaker positions are static with no head-tracking compensation.
For an accurate, externalised binaural image, full HRTF convolution via SPARTA AmbiBIN or IEM BinauralDecoder remains the reference. The quad-to-stereo approach is most appropriate where robustness across playback systems matters more than geometric precision.
References
2019
- Franz Zotter and Matthias Frank.
Ambisonics: A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality.
Springer, 2019.
doi:10.1007/978-3-030-17207-7.
[details] [BibTeX▼]
1997
- Jens Blauert.
Spatial Hearing: The Psychophysics of Human Sound Localization.
MIT Press, revised edition, 1997.
[details] [BibTeX▼]
1979
- Jont B. Allen and David A. Berkley.
Image method for efficiently simulating small-room acoustics.
Journal of the Acoustical Society of America, 65(4):943–950, 1979.
doi:10.1121/1.382599.
[details] [BibTeX▼]
