The Concept

Decoding an Ambisonic mix to four virtual speakers in the horizontal plane — front left, front right, rear left, rear right — and summing those to stereo is a robust alternative to full binaural convolution. The approach produces a wider, more stable image than a direct HOA-to-stereo downmix, while avoiding the phase artifacts that binaural HRTF decoding introduces in the low-mids.

/images/spatial/quad_stereo.png — *Signal flow: HOA to four virtual speakers, with HF reduction on the rear bus before summing to stereo.*

HRTFs work with two main principles:

Gains, Phase shifts and inter-aural time differences — the microscopic delays between left and right ear that create the illusion of externalization.
Spectral shaping — a frequency-dependent coloration caused by the pinna reflecting and diffracting sound before it reaches the ear canal. For sounds arriving from behind, this produces a characteristic high-frequency rolloff and a set of notches in the 6–10 kHz range (Blauert, 1997).

The auditory system uses both cues, but the spectral shaping is particularly robust — it works even in mono and is the primary cue for elevation and front/back disambiguation.

By applying a high-frequency rolloff to the rear channels before summing to stereo, the rear path provides a spectral cue consistent with "this sound is behind me," without the phase manipulation that causes metallic artifacts in full binaural decoding.

The quad-to-stereo approach typically sounds:

Less phasey than full binaural — no inter-aural crosstalk cancellation
Wider than a direct stereo downmix — the rear channels push energy to the sides of the stereo field
More robust across different headphones and even loudspeakers — the spectral cue survives playback systems that would destroy phase-based cues

The Setup in Reaper

/images/spatial/quad_mix.png — *Signal flow: HOA to four virtual speakers, with HF reduction on the rear bus before summing to stereo.*

Virtual speaker placement in IEM AllRADecoder:

Front pair: ±30°–45° azimuth, 0° elevation
Rear pair: ±135° azimuth, 0° elevation

Keep all speakers in the horizontal plane. Elevation in the virtual array introduces height decoding artifacts when summing to stereo.

Processing on the rear bus:

The rear bus is where the character of the spatial impression is shaped. Several types of processing are useful here, and they can be combined:

Spectral shaping (the core tool): A high-frequency shelf or low-pass is the primary means of creating the front/rear distinction. A gentle shelf starting around 5 kHz preserves some rear-channel air; a steeper low-pass at 3–4 kHz produces a stronger sense of immersion at the cost of high-frequency diffuseness. This mimics the spectral signature of rear-hemisphere HRTFs (Blauert, 1997).
Level trim: Reducing the rear bus by 3–6 dB relative to the front keeps the center image stable and prevents the rear energy from masking the direct sources.
Decorrelation: A small amount of decorrelation between RL and RR (via a short allpass or the SPARTA Decorrelator) widens the perceived rear field and prevents the two rear channels from summing to a narrow mono image. Use sparingly — too much decorrelation sounds unnatural and creates mono compatibility issues.
Diffusion / early reflections: A short, dense reverb tail on the rear bus can increase the sense of envelopment without adding direct-source energy. Keep pre-delay at zero so the rear bus does not arrive before the front.

For spectral shaping, any single-band shelf or low-pass works well:

Platform	Suitable Plugin	Notes
Linux	LSP Parametric EQ, x42-eq	Per-channel mode for precise rear-only processing
macOS	FabFilter Pro-Q 3, built-in Channel EQ
Windows	FabFilter Pro-Q 3, TDR Nova

Front/Rear Balance

The level ratio between the front and rear buses determines the perceived depth of the mix.

Too much rear level: the mix feels diffuse and lacks a stable center image.
Too little rear level: the spatial width collapses and the result sounds like a conventional stereo downmix.

A starting point is to set the rear bus 3–6 dB below the front bus and adjust by ear. In a dense electroacoustic mix the rears primarily carry reverb tails, spatial diffuseness, and low-mid weight — not direct sources — so they can often sit lower than expected while still contributing a clear sense of space.

Limitations

This technique is a perceptual approximation, not a geometrically accurate spatial reproduction. Its main limitations:

No externalization — without phase-based HRTF cues, the sound remains inside the head to some degree. The spectral rear cue creates the impression of depth and width rather than true externalization.
Front/back collapse on loudspeakers — the spectral cue is weaker on speakers than on headphones, so the front/rear distinction may be less clear outside a headphone context.
Fixed virtual geometry — the four speaker positions are static with no head-tracking compensation.

For an accurate, externalised binaural image, full HRTF convolution via SPARTA AmbiBIN or IEM BinauralDecoder remains the reference. The quad-to-stereo approach is most appropriate where robustness across playback systems matters more than geometric precision.

References

2019

Franz Zotter and Matthias Frank. Ambisonics: A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality. Springer, 2019. doi:10.1007/978-3-030-17207-7.
[details] [BibTeX▼]

@book{zotter2019,
    author = "Zotter, Franz and Frank, Matthias",
    title = "Ambisonics: A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality",
    publisher = "Springer",
    year = "2019",
    doi = "10.1007/978-3-030-17207-7"
}

1997

Jens Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization. MIT Press, revised edition, 1997.
[details] [BibTeX▼]

@book{blauert1997,
    author = "Blauert, Jens",
    title = "Spatial Hearing: The Psychophysics of Human Sound Localization",
    edition = "revised",
    publisher = "MIT Press",
    year = "1997"
}

1979

Jont B. Allen and David A. Berkley. Image method for efficiently simulating small-room acoustics. Journal of the Acoustical Society of America, 65(4):943–950, 1979. doi:10.1121/1.382599.
[details] [BibTeX▼]

@article{allen1979,
    author = "Allen, Jont B. and Berkley, David A.",
    title = "Image method for efficiently simulating small-room acoustics",
    journal = "Journal of the Acoustical Society of America",
    volume = "65",
    number = "4",
    pages = "943--950",
    year = "1979",
    doi = "10.1121/1.382599"
}