Summing in Higher-Order Ambisonics

The Low-Mid Problem in HOA

When summing multiple sources into a Higher-Order Ambisonic (HOA) bus, problems familiar from stereo mixing — mud, low-mid buildup, loss of clarity — manifest differently because they are tied to the spatial energy distribution across the spherical harmonics, not just frequency content. This section covers the main problems and fixes using the IEM Plugin Suite and SPARTA, both of which run on Linux, macOS, and Windows.

All examples assume 7th order in the AmbiX format: ACN channel ordering and SN3D normalization.


In Ambisonics, mud is not only a frequency problem — it is a spatial energy distribution problem. Low-mid buildup (200 Hz–500 Hz) in the \(W\) channel (the omnidirectional pressure component) makes the soundfield feel like a physical pressure inside the listener's head rather than sources positioned in space.

The key is the ratio between omnidirectional and directional energy (Zotter & Frank, 2019):

  • \(W\) carries the total sound pressure

  • \(X\), \(Y\), \(Z\) carry the pressure gradients

If \(W\) is too dominant in the low-mids, the decoder has little directional information to work with and the mix collapses inward.

In 7th-order HOA (N=7, 64 channels), this is amplified: the higher-order channels provide extreme spatial precision for high frequencies, but feeding broadband low-mid energy into all 64 channels creates spatial aliasing and resonant ringing (Zotter & Frank, 2019).


Single-Bus Solutions

The W-Channel Dip

A targeted cut on the \(W\) channel in the 200 Hz–500 Hz range is the most direct fix for a single HOA master bus. Use the IEM MultiEQ in per-channel mode, which is order-aware and available on all platforms.

Recommended starting point:

  • Frequency: 300 Hz

  • Shape: wide bell or gentle low shelf

  • Amount: −1.5 dB to −2 dB on \(W\) only

This forces the decoder to reconstruct that frequency range more heavily from \(X\) and \(Y\), shifting the perception from pressure to position.


The Z-Axis

Below roughly 200–300 Hz, human elevation perception breaks down entirely. The auditory system relies on pinna-related spectral cues for vertical localization, and these cues only function above a few hundred Hz (Blauert, 1997). Low-frequency energy in the \(Z\) channel therefore contributes nothing perceptually useful, but adds to the general pressure feeling.

Apply a high-pass filter to the height-related channels at around 200 Hz using the IEM MultiEQ, targeting the \(Z\) channel (ACN index 3).

Use the IEM EnergyVisualizer to check: significant energy at the poles (top and bottom of the sphere) in the low-mid range indicates the height channels need cleaning.


Sub-Bass: W Only

Below ~100 Hz, wavelengths are long enough that spatial encoding has no perceptual effect. The brain localises sub-bass primarily via inter-aural time difference (ITD), not via the spectral or directional cues that the higher-order channels encode (Blauert, 1997). Encoding sub-frequencies into the directional channels adds complexity without any audible benefit.

Send sub-bass as a 0th-order signal — into \(W\) only — keeping the low end solid, phase-coherent, and free of unnecessary energy in the directional channels.


Parallel HOA Buses: The Spatial Pyramid

For dense, multi-layer textures, splitting into three parallel Ambisonic buses with different frequency ranges and different effective orders gives independent control over each spatial register — a frequency-dependent spatial crossover (Zotter & Frank, 2019).

/images/spatial/spatial_pyramid_2.png

Parallel HOA bus structure for three bands.

Bus

Frequency Range

Effective Order

Rationale

[SUB]

80...120 Hz

0th order (\(W\) only)

Sub-bass has no perceptual directionality. Keep it as a clean mono anchor.

[BODY]

80...120 Hz – 500...1.5 kHz

2nd / 3rd order

Keeps the body of the mix stable. Prevents spatial jitter in the mids. Apply W-channel dip here.

[AIR]

500...1.5 kHz

7th order (full 64 ch)

High frequencies benefit from maximum spatial resolution.

Order tapering via Reaper Pin Matrix

Even though all encoders output 7th order (64 channels), order reduction on each bus is done via the Pin Matrix: mute all outputs except channel 1 on [SUB], mute outputs 17–64 on [BODY] (limiting to 3rd order), and leave all 64 active on [AIR].

Checking phase alignment at the crossovers

Use IEM MultiEQ in linear phase mode for all crossover splits. Standard minimum-phase filters shift phase differently across the 64 channels, blurring spatial positions. Ensure both buses at each boundary use the same linear phase slope and cutoff frequency.

To check each crossover, mute the third bus and listen to the two that share a boundary:

  • Mute [AIR], listen to [SUB] + [BODY] at the 80 Hz boundary.

  • Mute [SUB], listen to [BODY] + [AIR] at the 1.5 kHz boundary.

If you hear a dip, hollow resonance, or comb filtering at the crossover frequency, the phases do not align.


Notes

  • Use the IEM EnergyVisualizer to diagnose the mix: a large central blob with little activity at the sphere's edges means \(W\) is dominant.

  • Periodically decode to 1st order as a sanity check — if the mix is still spatially coherent at 1st order, the 7th-order version will be robust.

  • Check for DC offset with a metering plugin before the final render — synthesis environments like SuperCollider or Max/MSP can introduce a small DC component that quietly eats headroom.


References

2019

  • Franz Zotter and Matthias Frank. Ambisonics: A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality. Springer, 2019. doi:10.1007/978-3-030-17207-7.
    [details] [BibTeX▼]

1997

  • Jens Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization. MIT Press, revised edition, 1997.
    [details] [BibTeX▼]

1979

  • Jont B. Allen and David A. Berkley. Image method for efficiently simulating small-room acoustics. Journal of the Acoustical Society of America, 65(4):943–950, 1979. doi:10.1121/1.382599.
    [details] [BibTeX▼]