Binaural Spatialization with SC-HOA

The SC-HOA library by Florian Grond is a feature-rich toolbox for working with Ambisonics and binaural synthesis in SuperCollider. Once installed, it is well documented inside the SC help files. Additional information and install instructions are part of the Git repository. This section gives a brief introduction into the solution used for the SPRAWL server.

Installing SC-HOA

The SC-HOA library is shipped as a so called Quark and it can be installed from inside SC. Besides a GUI-based way, a single command is enough to install the complete library with all objects and classes in the system's directories:

Quarks.install("https://github.com/florian-grond/SC-HOA")

Make sure to reboot the interpreter after installing the Quark. The external classes need to be compiled.

To find out where SC has installed your external, run:

Platform.userExtensionDir

Network Audio

OSI Model

The OSI Model groups different services, functions and applications of telecommunication systems into seven hierarchically arranged layers:

OSI Model

Layer

Name

Description

7

Application Layer

End user layer, HCI layer

6

Presentation Layer

data conversion, syntax

5

Session Layer

connection management, sockets

4

Transport Layer

end-to-end connections (TCP, UDP)

3

Network Layer

packet routing

2

Data Link Layer

data formats (bits to frames, MAC addresses)

1

Physical layer

bit stream transmission over medium/hardware (Ethernet, WiFi, ...)


Network based audio systems can be based on different layers. This affects their capabilities and application areas. A comprehensive list can be found here: comparison on Wikipedia


Layer 1 Solutions

Layer 1 solutions only rely on the hardware used in telecommunication systems and use their own routing mechanisms. As a consequence, they usually need specific routers and are often used for direct peer-to-peer connections. The most widespread solution is the open AES50 format, which is found in devices by Behringer and Midas.

Layer 2 Solutions

Layer 2 solutions use the standard Ethernet protocol for transmitting data. Standard routers and hardware can thus be used for routing. Among the well known formats are AVB and AES51, as well as several proprietary solutions.

Layer 3 Solutions

Layer 3 solutions feature an IP header in their packages. Example solutions are DANTE, AES67, RAVENNA and AVB.

Layer 4 Solutions

Some solutions are based on Layer 4 protocols like TCP or UDP 1. Since UDP is faster due to the missing handshake and error-correction. Although this makes it prone to package loss, it is the preferred method for achieving acceptable latency at the cost of dropouts, depending on the quality of the connection.

Examples for Layer 4 solutions can be found in the free and open software community, including NetJack2 2, Zita-njbridge 3 and JackTrip.


1

This needs more references, since it is not unambiguous on which layer they are working.

2

https://github.com/jackaudio/jackaudio.github.com/wiki/WalkThrough_User_NetJack2

3

http://kokkinizita.linuxaudio.org/linuxaudio/index.html

Using SSH for Remote Access

SSH (Secure Shell Protocol) is necessary when working with the server but can also be helpful for configuring the Access Points. For remote machines - like the SPRAWL Server - SSH can be used for command-line operations and command execution.

Connecting to an SSH Server

For connecting to a remote machine, it needs to run an SSH server. On the client side, an SSH connection can be established without additional installations from the terminal on Linux and MAC machines and - since version 10 - from Windows. For older Windows versions, users can use Putty.

$ ssh username@address

X11 Forwarding

With X11 Forwarding, SSH can also be used to run applications with a GUI, remotely. Simply add the -X argument to do so:

$ ssh -X username@address

Remote Commands

SSH can also be used to send single commands, without starting a remote session. This example launches the jack_simple_client, which plays a continuing sine tone on the remote machine.

$ ssh -t username@address 'jack_simple_server'

Exercise

Log into the server with SSH.

Concept

This module focuses on fundamental principles of sound synthesis algorithms in C++, covering paradigms like subtractive synthesis, additive synthesis, physical modeling, distortion methods and processed recording. Theory and background of these approaches are covered in the contents of the Sound Synthesis Introduction.

The concept is based on Linux audio systems as development and runtime systems (von Coler & Runge, 2017). Using Raspberry PIs, classes can be supplied with an ultra low cost computer pool, resolving any compatibility issues of individual systems. Besides, the single board computers can be integrated into embedded projects for actual hardware instruments. Participants can also install Linux systems on their own hardware for increased performance.

Only few software libraries are part of the system used in this class, taking care of audio input and output, communication (OSC, MIDI), configuration and audio file processing. This minimal required framework allows the focus on the actual implementation of the algorithms on a sample-by-sample level, not relying on extensive higher level abstractions.


Although the concept of this class has advantages, there are different alternatives with their own benefits. There is a variety of frameworks to consider for implementing sound synthesis paradigms and building digital musical instruments with C/C++. The JUCE framework allows the compilation of 'desktop and mobile applications, including VST, VST3, AU, AUv3, RTAS and AAX audio plug-ins'. It comes with many helpful features and can be used to create DAW-ready software components. Environments like Puredata or SuperCollider come with APIs for programming user extensions. The resulting software components can be integrated into existing projects, easily.


References

2017

  • Henrik von Coler and David Runge. Teaching Sound Synthesis in C/C++ on the Raspberry Pi. In Proceedings of the Linux Audio Conference. 2017.
    [details] [BibTeX▼]

The JACK API

All examples in this class are implemented as JACK clients. Audio input and output is thus based on the JACK Audio API. The JACK framework takes over a lot of management and allows a quick entry point for programmers. Professional Linux audio systems are usually based on JACK servers, allowing the flexible connection of different software components. Read more in the JACK Section of the Computer Music Basics.


The ThroughExample

The ThroughExample is a slightly adapted version of the Simple Client. It wraps the same functionality into a C++ class, adding multi-channel capabilities.


Main

The file main.cpp creates an instance of the ThroughExample class. No command line arguments are passed and the object is created without any arguments:

ThroughExample *t = new ThroughExample();

Member Variables

jack_client_t   *client;

The pointer to a jack client is needed for connecting this piece of software to the JACK server.

The MIDI Protocol

The MIDI protocol was released in 1982 as a means for connecting electronic musical instruments. First synths to feature the new technology were the Prophet-600 and the Jupiter-6. Although limited in resolution from a recent point of view, it is still a standard for conventional applications - yet to be replaced by the newly released MIDI 2.0. Besides rare mismatches and some limitations, MIDI devices can be connected without complications. Physically, MIDI has been introduced with the still widespread 5-pin connector, shown below. In recent devices, MIDI is usually transmitted via USB.

MIDI jack  (5-pin DIN).

MIDI jack (5-pin DIN).



Standard MIDI Messages

MIDI transmits binary coded messages with a speed of $31250\ \mathrm{kbit/s}$. Timing and latency are thus not a problem when working with MIDI. However, the resolution of control values can be a limiting factor. Standard MIDI messages consist of three Bytes, namely one status Byte (first bit green) and two data bytes (first bit red). The first bit declares the Byte either a status Byte (1) or a data Byte (0).

/images/basics/midi-message.png

Standard MIDI message with three Bytes.


Some of the most common messages are listed in the table below. Since one bit is used as the status/data identifier, 7 bits are left for encoding. This results in the typical MIDI resolution of \(2^7 = 128\) values for pitch, velocity or control changes.

Voice Message           Status Byte      Data Byte1          Data Byte2
-------------           -----------   -----------------   -----------------
Note off                      8x      Key number          Note Off velocity
Note on                       9x      Key number          Note on velocity
Polyphonic Key Pressure       Ax      Key number          Amount of pressure
Control Change                Bx      Controller number   Controller value
Program Change                Cx      Program number      None
Channel Pressure              Dx      Pressure value      None
Pitch Bend                    Ex      MSB                 LSB

Pitch Bend

If you are stuck with MIDI for some reason but need a higher resolution, the Pitch Bend parameter can help. Each MIDI channel has one Pitch Bend, each with two combined data Bytes, resulting in a resolution of \(128^2 = 16384\) steps.


System Exclusive

SysEx messages can be freely defined by manufacturers. They are often used for dumping or loading settings and presets, but can also be used for arbitrary control purposes. SysEx messages can have any length and are not standardized.


MIDI Note to Hertz

When working with MIDI, a conversion from MIDI pitch to Hertz is often necessary. There are two simple formulas for doing that. They both refer to the MIDI pitch of 69, wich corresponds to a frequency of 440 Hz:

\begin{equation*} f[\mathrm{Hz}] = 2 \frac{\mathrm{MIDI}-69}{12} 440 \end{equation*}
\begin{equation*} \mathrm{MIDI} = 69 +12 \log_2 \left( \frac{f}{440 \mathrm{Hz}} \right) \end{equation*}

Getting Started with SuperCollider

Supercollider (SC) is a server-client-based tool for sound synthesis and composition. SC was started by James McCartney in 1996 and is free software since 2002. It can be used on Mac, Linux and Windows systems and comes with a large collection of community-developed extensions. The client-server principle aims at live coding and makes it a powerful tool for distributed and embedded systems, allowing the full remote control of synthesis processes.

There are many ways of approaching SuperCollider, depending on the intended use case. Some tutorials focus on sequencing, others on live coding or sound design. This introduction aims at programming remotely controlled synthesis and processing servers, which involves signal routing and OSC capabilities.


Getting SC

Binaries, source code and build or installation instructions can be found at the SC GitHub site. If possible, it is recommended to build the latest version from the repository:

https://supercollider.github.io/downloads

SuperCollider comes with a large bundle of help files and code examples but first steps are usually not easy. There are a lot of very helpful additional resources, providing step by step introductions.

Code snippets in this example are taken from the accompanying repository: SC Example. You can simple copy and paste them into your editor.


SC Trinity

SuperCollider is based on a client-server paradigm. The server is running the actual audio processing, whereas clients are used to control the server processes via OSC messages. Multiple clients can connect to a running server. The dedicated ScIDE allows convenient features for live coding and project management:

/images/basics/supercollider-components.png

Server, client and ScIDE.


sclang

sclang is the SuperCollider language. It represents the client side when working with SC. It can for example be started in a terminal by running:

$ sclang

Just as with other interpreted languages, such as Python, the terminal will then change into sclang mode. At this point, the class library is complied, making all SC classes executable. Afterwards, SC commands can be entered:

sc3>  postln("Hello World!")

ScIDE

Working with SC in the terminal is rather inconvenient. The SuperCollider IDE (ScIDE) is the environment for live coding in sclang, allowing the control of the SuperCollider language:

/images/basics/scide.png

ScIDE


When booting the ScIDE, it automatically launches sclang and is then ready to interpret. Files opened in the IDE can be executed as a whole. Moreover, single blocks, respectively single lines can be evaluated, which is especially handy in live coding, when exploring possibilities or prototyping. In addition, the IDE features tools for monitoring various server properties.


Some Language Details

Parentheses

Parentheses can help structuring SC code for live programming. Placing the cursor inside a region between parentheses and pressing Control + Enter evaluates the code inside the parentheses. This way of coding is not suited for scripts which are executed as one.

(
      post('Hello ');
      postln('World!');
)

Variable Names

Global variables are either single letters - s is preserved for the default server - or start with a tilde: ~varname). They can be declared and used anywhere in a language instance. The first letter of tilde variables must be lowercase. Local variables, used in functions or code blocks, need to be defined explicitly:

// single-letter-global variable:
x = 1.0;

// tilde-global variables:
~aValue = 1.1;

// local variable:
var foo;

Declare First

All declarations of local variables must happen in the beginning of a function or block. The following example throws an error:

(
var xValue = 1.0;

xValue.postln;

var yValue = 2.1;
)

Evaluating Selections

Some of the examples in the SC section of this class are in the repository, whereas other only exist as snippets on these pages. In general, all these examples can be explored by copy-pasting the code blocks from the pages into the ScIDE. They can then be evaluated in blocks or line-wise but can not be executed as complete files. This is caused by the problem of synchronous vs asynchronous processes, which is explained later: Synchronous vs Asynchronous

These features help to run code in the ScIDE subsequently:

  • Individual sections of code can be evaluated by selecting them and pressing Control + Enter.

  • Single lines of code can be evaluated by placing the cursor and pressing Shift + Enter


Functions

Functions in SC are defined inside curly brackets. Arguments can are declared in the very beginning. Once created, a function is used by calling the .value() method:

(
~poster = {

    arg a,b;

    var y = a+b;

    y.postln;

};
)

~poster.value(1,1);

Arguments can also be defined inside pipes:

~poster = {

    |a,b|

    a.postln;

};

Getting Started with Puredata

About

Puredate (PD) is the free and open source version of Max/MSP, also developed and maintained by Miller Puckette. PD is one of the best options for people new to computer music, due to the obvious signal flow. It is a very helpful for exploring the basics of sound synthesis and programming but can also be used for advanced applications: https://puredata.info/community/member-downloads/patches.

As a graphical programming environment, PD offers simple and flexible means for creating control and GUI software. There are a lot of great tutorials and examples online. This one features almost anything to know: http://write.flossmanuals.net/pure-data/


Versions & Download

PD comes in different versions, including customized ones for specific applications (Pd-L2Ork) or PurrData with an HTML5 GUI. This tutorial relies on the plain version, referred to as vanilla. The latest build can be downloaded here for all major operating systems: https://puredata.info/downloads/pure-data


Help Files

The help files shipped with the install of PD feature a plethora of examples on programming principles and audio signal processing, both for beginners and advanced. Many examples and parts of this tutorial are based on this library. It is recommended to explore the contents of the help browser.


Working with PD Files

PD patches are organized in files with the .pd extension. This first patch is the well known Hello World! example, not using any audio processing:

/images/basics/pd-hello-world.png

It introduces two concepts:

  • A message, containing the string Hello World!.

  • An object, namely print, printing the string to the PD console in the main window.

Edit Mode / Performance Mode

PD can operate in two modes:

  • The Edit Mode allows to change patches (add, move and connect objects).

  • The Performance Mode is used to perform with the patches (change values, operate GUI elements).

The mode can be changed in the menu or via the shortcut Ctrl+E (Cmd+E on Mac). Only when in Performance Mode, the message object can be clicked and and the output is printed.

Text File Format

All PD patches are stored as text files, declaring objects and connections line by line. This makes version control possible, although small changes in object positions can result in many changes inside the text representation. For the above example, the related text version of the PD file looks as follows:

#N canvas 899 583 450 300 12;
#X obj 75 107 print;
#X msg 75 29 Hello World!;
#X connect 1 0 0 0;

References

1997

  • Miller S. Puckette. Pure Data. In Proceedings of the International Computer Music Conference (ICMC). Thessaloniki, \\ Greece, 1997.
    [details] [BibTeX▼]

1988

  • Miller S. Puckette. The patcher. In Proceedings of the International Computer Music Conference (ICMC). Computer Music Association, 1988.
    [details] [BibTeX▼]

Concatenative: Crowd Noise Synthesis

Two master's thesis in collaboration between Audiocommunication Group and IRCAM aimed at a parametric synthesis of crowd noises, more precisely of many people speaking simultaneously (Grimaldi, 2016; Knörzer, 2017). Using a concatenative approach, the resulting synthesis system can be used to dynamically change the affective state of the virtual crowd. The resulting algorithm was applied in user studies in virtual acoustic environments.

Recordings

The corpus of speech was gathered in two group sessions, each with five persons, in the anechoic chamber at TU Berlin. For each speaker, the recording was annotated into regions of different valence and arousal and then segmented into syllables, automatically.

Features

/images/Sound_Synthesis/concatenative/valence_arousal_1.png

Synthesis

The following example synthesizes a crowd with a valence of -90 and an arousal of 80, which can be categorized as frustrated, annoyed or upset. No virtual acoustic environment is used, and the result is rather direct:


References

2017

  • Grimaldi, Vincent and Böhm, Christoph and Weinzierl, Stefan and von Coler, Henrik. Parametric Synthesis of Crowd Noises in Virtual Acoustic Environments. In Proceedings of the 142nd Audio Engineering Society Convention. Audio Engineering Society, 2017.
    [details] [BibTeX▼]
  • Christian Knörzer. Concatenative crowd noise synthesis. Master's thesis, TU Berlin, 2017.
    [details] [BibTeX▼]

2016

2006

  • Diemo Schwarz. Concatenative sound synthesis: The early years. Journal of New Music Research, 35(1):3–22, 2006.
    [details] [BibTeX▼]
  • Diemo Schwarz, Grégory Beller, Bruno Verbrugghe, and Sam Britton. Real-Time Corpus-Based Concatenative Synthesis with CataRT. In In DAFx. 2006.
    [details] [BibTeX▼]

2000

  • Diemo Schwarz. A System for Data-Driven Concatenative Sound Synthesis. In Proceedings of the COST-G6 Conference on Digital Audio Effects (DAFx-00). Verona, Italy, 2000.
    [details] [BibTeX▼]

1989

  • C. Hamon, E. Mouline, and F. Charpentier. A diphone synthesis system based on time-domain prosodic modifications of speech. In International Conference on Acoustics, Speech, and Signal Processing,, 238–241 vol.1. May 1989. doi:10.1109/ICASSP.1989.266409.
    [details] [BibTeX▼]

1986

  • F. Charpentier and M. Stella. Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 11, 2015–2018. April 1986. doi:10.1109/ICASSP.1986.1168657.
    [details] [BibTeX▼]

Concatenative: Introduction

Concatenative synthesis is an evolution of granular synthesis, first introduced in the context of speech synthesis and processing (Charpentier, 1986; Hamon, 1989).

Concatenative synthesis for musical applications has been introduced by Diemo Schwarz. Corpus-based concatenative synthesis (Schwarz, 2000; Schwarz 2006) splices audio recordings into units and calculates audio features for each unit. During synthesis, unit selection can be performed by navigating the multidimensional feature space and selected units are concatenated.

/images/Sound_Synthesis/concatenative/concatenative-flow-1.png
Fig.1

(Schwarz, 2006)


/images/Sound_Synthesis/concatenative/concatenative-flow-2.png
Fig.2

(Schwarz, 2006)


References

2017

  • Grimaldi, Vincent and Böhm, Christoph and Weinzierl, Stefan and von Coler, Henrik. Parametric Synthesis of Crowd Noises in Virtual Acoustic Environments. In Proceedings of the 142nd Audio Engineering Society Convention. Audio Engineering Society, 2017.
    [details] [BibTeX▼]
  • Christian Knörzer. Concatenative crowd noise synthesis. Master's thesis, TU Berlin, 2017.
    [details] [BibTeX▼]

2016

2006

  • Diemo Schwarz. Concatenative sound synthesis: The early years. Journal of New Music Research, 35(1):3–22, 2006.
    [details] [BibTeX▼]
  • Diemo Schwarz, Grégory Beller, Bruno Verbrugghe, and Sam Britton. Real-Time Corpus-Based Concatenative Synthesis with CataRT. In In DAFx. 2006.
    [details] [BibTeX▼]

2000

  • Diemo Schwarz. A System for Data-Driven Concatenative Sound Synthesis. In Proceedings of the COST-G6 Conference on Digital Audio Effects (DAFx-00). Verona, Italy, 2000.
    [details] [BibTeX▼]

1989

  • C. Hamon, E. Mouline, and F. Charpentier. A diphone synthesis system based on time-domain prosodic modifications of speech. In International Conference on Acoustics, Speech, and Signal Processing,, 238–241 vol.1. May 1989. doi:10.1109/ICASSP.1989.266409.
    [details] [BibTeX▼]

1986

  • F. Charpentier and M. Stella. Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 11, 2015–2018. April 1986. doi:10.1109/ICASSP.1986.1168657.
    [details] [BibTeX▼]