Faust: Compiling Plugins
VST
Alber Graef provided a VST plugin architecture for the Faust programming language:
Alber Graef provided a VST plugin architecture for the Faust programming language:
A basic Ambisonics production workflow can be split into three stages, as shown in Figure 1. The advantage of this procedure ist that the production is independent of the output format, since the intermediate format is in the Ambisonics domain. A sound field produced in this way can subsequently be rendered or decoded to any desired loudspeaker setup or headphones.
In the encoding stage, Ambisonics signals are generated. This can happen via recording with an Ambisonics microphone or through encoding of mono sources with individual angles (azimuth, elevation). A plain Ambisonics encoding does not include distance information - altough it can be added through attenuation. All encoded signals have the same amount of $N$ ambisonics channels.
All individual Ambisonics signals can be summed up to create one scene, respectively one sound field.
In the decoding stage, individual output signals can be calculated. This requires either head-related transfer functions or loudspeaker coordinates.
More advanced workflows may feaure additional stages for manipulating encoded Ambisonics signals, inlcuding directional filtering or rotation of the audio scene.
Additive Synthesis and Spectral Modeling are in detail introduced in the corresponding sections of the Sound Synthesis Introduction. Since sounds are created by combining large numbers of spectral components, such as harmonics or noise bands, spatialization at synthesis stage is an obvious method. Listeners can thereby be spatially enveloped by a single sound, with spectral components being perceived from all angles. The continuous character, however, blurs the localization.
Spatio-operational spectral (SOS) synthesis (Topper, 2002) is an attempt towards a dynamic spatial additive synthesis, implemented in MAX/MSP and RTcmix. Partials are rotated independently within a 2D 8 channel speaker setup. A first experiment used a varying rate circular spatial path of the first eight partials of a square wave, as shown in Figure 1.
Figure 2 shows the second experiment with one partial moving against the others.
GLOOO is a system for real-time expressive spatial synthesis with spectral models. A haptic interface allows the dynamic distribution of 100 spectral components, allowing a control over the spread and position of the resulting violin sound. The project is best documented on the corresponding websites:
The Klangmühle was an early electronic device for spatialization, allowing the panning between different channels by moving a crank, which was then mapped to multiple variable resistors.
The Rotationstisch was used by Karlheinz Stockhausen for his work Kontakte (1958-60) (von Blumroeder, 2018). In the studio, the device was used for producing spatial sound movements on a quadraphonic loudspeaker setup. This was realized with four microphones in a quadratic setup, each pointing towards a loudspeaker in the center:
The predominant effect of the Rotationstisch is amplitude panning, using the directivity of the loudspeaker and wave guide. In addition, the spatialization includes a Doppler shift when rotating the loudspeaker. The rotation device can be moved manually, thus allowing to perform the spatial movements and record them on quadraphonic tape:
Stockhausen's 1958-60 composition Kontakte can be considered a milestone of multichannel music. It exists as a tape-only version, as well as a version for tape and live piano and percussion. For the tape part, the Rotationstisch was used to create the spatial movements - not fully captured in this stereo version (electronics only). Listen to 17'00'' for the most prominent rotation movement in four channels:
The Karplus-Strong algorithm is a proto-physical model. The underlying theory is covered in the Karplus-Strong Section of the Sound Synthesis Introduction. Although the resulting sounds are very interesting, the Karplus-Strong algorithm is easy to implement, especially in C/C++. It is based on a single buffer, filled with noise, and a moving average smoothing.
Besides the general framework of all examples in this teaching unit, the karplus_strong_example
needs just a few additional elements, defined in the class`s header:
// the buffer length int l_buff = 600; // the 'playback position' in the buffer int buffer_pos=0; /// noise buffer double *noise_buffer; /// length of moving average filter int l_smooth = 10; // feedback gain double gain =1.0;
Note that the pitch of the resulting sound is hard-coded in this example, since it is based only on the sampling rate of the system and the buffer length. In contrast to the original Karplus-Strong algorithm, this version uses an arbitrary length for the moving average filter, instead of only two samples. This results in a faster decay of high frequency components.
Since the noise buffer is implemented as a pointer to an array of doubles,
it first needs to be allocated and initialized. This happens in the constructor of the karplus_strong_example
class:
// allocate noise buffer noise_buffer = new double [l_buff]; for (int i=0; i<l_buff; i++) noise_buffer[i]=0.0;
Each time the Karplus-Strong algorithm is excited, or plucked, the buffer needs to be filled with a sequence of random noise. At each call of the JACK callback function (process
), it is checked, whether a new event has been triggered via MIDI or OSC.
If that is true, the playback position of the buffer is set to 0
and each sample of the noise_buffer
is filled with a random double between -1 and 1:
cout << "Filling buffer!"; buffer_pos = 0; for(int i=0; i<=l_buff; i++) noise_buffer[i]= rand() % 2 - 1;
The sound is generated by directly writing the samples of the noise_buffer
to the JACK output buffer. This is managed in a circular fashion with the buffer_pos
counter. Wrapping the counter to the buffer size makes the process circular. This example uses a stereo output with the mono signal.
for(int sampCNT=0; sampCNT<nframes; sampCNT++) { // write all input samples to output for(int chanCNT=0; chanCNT<nChannels; chanCNT++) { out[chanCNT][sampCNT]=noise_buffer[buffer_pos]; } // increment buffer position buffer_pos++; if (buffer_pos>=l_buff) buffer_pos=0; }
The above version results in a never-ending oscillation, a white tone. The timbre of this tone changes with every triggering, since a unique random sequence is used each time. With the additional smoothing, the tone will decay and lose the high spectral components, gradually. This is done as follows:
// smoothing the buffer double sum = 0; for(int smoothCNT=0; smoothCNT<l_smooth; smoothCNT++) { if(buffer_pos+smoothCNT<l_buff) sum+=noise_buffer[buffer_pos+smoothCNT]; else sum+=noise_buffer[smoothCNT]; } noise_buffer[buffer_pos] = gain*(sum/l_smooth);
To compile the KarplusStrongExample, run the following command line:
g++ -Wall -L/usr/lib src/yamlman.cpp src/main.cpp src/karplus_strong_example.cpp src/oscman.cpp src/midiman.cpp -ljack -llo -lyaml-cpp -lsndfile -lrtmidi -o karplus_strong
This call of the g++ compiler includes all necessary libraries and creates the binary karplus_strong
.
The binary can be started with the following command line:
./karplus_strong -c config.yml -m "OSC"
This will use the configurations from the YAML file and wait for OSC input. The easiest way of triggering the synth via OSC is to use the Puredata patch from the example's directory.
Exercise I
Make the buffer length and filter length command line or realtime-controllable parameters.
Exercise II
Implement a fractional noise buffer for arbitrary pitches.
Although the MIDI protocol is quite old and has several drawbacks, it is still widely used and is appropriate for many applications. Read the MIDI section in the Computer Music Basics for a deeper introduction.
The development system used in this class relies on the RtMidi framework. This allows the inclusion of any ALSA MIDI device on Linux systems and hence any USB MIDI device. The RtMidi Tutorial gives a thorough introduction to the use of the library.
The Advanced Linux Sound Architecture (ALSA) makes audio- and MIDI interfaces accessible for software. As an API it is part of the Linux kernel. Other frameworks, like JACK or Pulseaudio work on a higher level and rely on ALSA.
After connecting a MIDI device to an USB port, it should be available via ALSA. All ALSA MIDI devices can be listed with the following shell command:
$ amidi -l
The output of this request can look as follows:
Dir Device Name IO hw:1 ,0 ,0 nanoKONTROL MIDI 1 IO hw:2 ,0 ,0 PCR-1 MIDI 1 I hw:2 ,0 ,1 PCR-1 MIDI 2
In this case, two USB MIDI devices are connected. They can be addressed by their MIDI device ID (hw:0/1).
The MIDI tester example can be used to print all incoming MIDI messages to the console. This can be helpful for reverse-engineering MIDI devices to figure out their controller numbers.
The MIDI Manager class introduced in this test example is used as a template for following examples which use MIDI. For receiving messages, RtMidi offers a queued MIDI input and a user callback mode. In the latter case, each incoming message triggers a callback function. For the queued mode, as used here, incoming messages are collected until retrieved by an additional process.
The midiMessage
struct is used to store incoming messages. It holds the three standard MIDI message bytes plus a Boolean for the processing state.
/// struct for holding a MIDI message typedef struct { int byte1 = -1; int byte2 = -1; double byte3 = -1; bool hasBeenProcessed = false; }midiMessage;
These are links to two live electronic pieces for synthesisizer ensembles, both with an individual approach to spatialization:
The TGrains
UGen is an easy to use granular synth. It uses a Hanning window
for each grain and offers control over position, pitch and length of the grains.
The help files offer multiple examples for using this unit generator.
The following example uses a simple pulse train for triggering grains.
A single channel is loaded to a buffer from a sample for this granular example. The duration in seconds can be queried from the buffer object, once loaded.
~buffer = Buffer.readChannel(s,"/some/wavefile.wav",channels:0); ~buffer.duration;
The granular node uses an Impulse
UGen to create a trigger signal for the TGrains
UGen.
This node has several arguments to control the granular process:
The density defines how often a grain is triggered per second.
Every grain can be pitch shifted by a value (1 = default rate).
The grain duration is specified in seconds.
The grain center is defined in seconds.
A gain parameter can be used for amplification.
buffer specifies the index of the buffer to be used.
Once the node has been created with a nil
buffer, the buffer index of the
previously loaded sample can be passed. Depending on the nature of the sample,
this can already result in something audible:
~grains = { | density = 1, pitch = 1, dur = 0.1, center = 0, gain = 1, buffer = nil | var trigger = Impulse.kr(density); Out.ar(0, gain * TGrains.ar(1, trigger, buffer, pitch, center, dur)); }.play(); ~grains.set(\buffer,~buffer.bufnum);
As with any node, the arguments of the granular process can be set, manually. Since the center is specified in seconds, the buffer duration is useful at this point.
~grains.set(\center,0.2); ~grains.set(\density,100); ~grains.set(\dur,0.2); ~grains.set(\pitch,0.8);
Exercise I
Use the mouse with buses for a fluid control of granular parameters.
Exercise II
Use envelopes for an automatic control of the granular parameters.
The frequency-domain representation gives insight into the composition of time series and hence of musical signals. In the digital domain we are formemost interested in discrete signals and will thus introduce the Discrete Fourier Transform (DFT). This section does not aim at a full introduction of the DFT, but illustrates a few aspects which help to better understand the basics of computer music and sound synthesis.
The DFT $X[k]$ of a discrete signal $x$ with the length $N$ and the sampling frequency $f_s$ is calculated as follows. For every frequency bin $k=1 ... N, N \in \mathbb{N}$ of the output, the correlation of the signal with a complex oscillation with the frequency $2 \pi \frac{k}{N}$ is calculated:
$$ \begin{eqnarray} X[k] & = & \sum\limits_{n=0}^{N-1} x[n] \left( \cos \left(2 \pi k \frac{n}{N}) -j \sin(2 \pi k \frac{n}{N} \right) \right) \\ X[k] & = & \sum\limits_{n=0}^{N-1} x[n] e^{-j 2 \pi k \frac{n}{N}} \end{eqnarray} $$Since real and imaginary part of the complex oscillation have a relative phase of $\frac{\pi}{2}$, the correlation does not only deliver information on the magnitude of spectral components, but also on their phase. The real part is a cosine, whereas the imaginary part is a sine function. The following plot shows the real and imaginary component of the complex oscillations for the indices $k=1$ and $k=2$:
In the field of musical signal processing, the sine wave (respectively the cosine wave) are the basic elements of complex sounds. They can be used to model and synthesize any periodic signal. Hence, the frequency domain representation of these harmonic functions is fundamental to the understanding of many algorithms for analysis and synthesis. In most visualizations in the spectral domain, a sinusoidal component is shown as a single peak at the oscillation's frequency. However, when viewed closely, this peak is smeared accompanied by several side lobes. The following example derives these characteristics, based on a $1024$ sample sine wave with a frequency of $f_0 = 100\ \mathrm{Hz}$ at a sampling rate of $f_s = 16\ \mathrm{kHz}$:
For calculating the DFT of sinusoidal signals, it makes sense to express them in the complex notation through Euler's formula:
$$ \sin(2 \pi f_0 \frac{n}{fs}) = \frac{1}{2j} \left( e^{j 2 \pi f_0 \frac{n}{fs} } -e^{-j2 \pi f_0 \frac{n}{fs}} \right) \\ $$The DFT of the sine wave thus extends to:
$$ X[k] = \sum\limits_{n=0}^{N-1} \frac{1}{2j} \left(e^{j 2 \pi \frac{f_0}{fs} n} -e^{-j2 \pi \frac{f_0}{fs} n} \right) e^{-j 2 \pi k \frac{n}{N}} \\ $$Solving:
$$ \begin{eqnarray} X[k] & = & \frac{1}{2j} \sum\limits_{i=0}^{N-1} e^{- j 2 \pi k \frac{n}{N} + j 2 \pi \frac{f_0}{fs} n } - e^{- j 2 \pi k \frac{n}{N} - j 2 \pi\frac{f_0}{fs} n} \\ % % & = & \frac{1}{2j} \sum\limits_{i=0}^{N-1} e^{- j 2 \pi ( \frac{k}{N} - \frac{f_0}{fs} ) n } - e^{- j 2 \pi ( \frac{k}{N} + \frac{f_0}{fs}) n} \\ % % & = & \frac{1}{2j} \sum\limits_{i=0}^{N-1} e^{- j 2 \pi ( \frac{k}{N} - \frac{f_0}{fs} ) n } - \frac{1}{2j} \sum\limits_{i=0}^{N-1} e^{- j 2 \pi ( \frac{k}{N} + \frac{f_0}{fs}) n} \\ % % \end{eqnarray} $$Geometric Series
Using the geometric series formula (with acknowledgements to 1)
$$ \sum\limits_{n=0}^{N-1} a^n = \frac{1-a^{N}}{1-a} $$the above equation results in:
$$ X[k] = \frac{1}{2j} \frac{1-e^{-j 2 \pi \left( \frac{k}{N} - \frac{f_0}{fs} \right) N}}{1-e^{-j 2 \pi \left( \frac{k}{N} - \frac{f_0}{fs} \right)}} % + \frac{1}{2j} \frac{1-e^{-j 2 \pi (\left( \frac{k}{N} + \frac{f_0}{fs} \right)) N}}{1-e^{-j 2 \pi \left( \frac{k}{N} + \frac{f_0}{fs} \right)}} $$^{1} http://www.eecs.umich.edu/courses/eecs452/overview.html
Factoring out
The above equation features the following term:
$$ E = \frac{1-e^{-j \Lambda N}}{1-e^{-j \Lambda}} $$After factoring out the term $\frac{e^{-j \Lambda \frac{N}{2}}}{e^{-j \frac{\Lambda}{2}}}$ we can solve further, using Euler's formula:
$$ \begin{eqnarray} E & = & \frac{e^{-j \Lambda \frac{N}{2}}} {e^{-j \frac{\Lambda}{2}}} \cdot \frac{e^{j \Lambda \frac{N}{2}} - e^{-j \Lambda \frac{N}{2}}}{e^{j \frac{\Lambda}{2}} - e^{-j \frac{\Lambda}{2}}} \\ % % & = & e^{-j \Lambda \frac{N+1}{2} } \underbrace{\frac{\sin(\Lambda \frac{N}{2} )}{\sin(\frac{\Lambda}{2})}}_{\text{Dirichlet function}} \end{eqnarray} $$As the plot below shows, the term is characterized by a Dirichlet function, which is related to the sinc
function. Inserting
we get the following result for the spectrum of the sinusoid:
$$ \begin{eqnarray} X [k] & = & \frac{1}{2j} \left( e^{-j \Lambda(-f_0) \frac{N+1}{2} } \frac{\sin(\Lambda(-f_0) \frac{N}{2} )}{\sin(\frac{\Lambda(-f_0) }{2})} % + e^{-j \Lambda(+f_0) \frac{N+1}{2} } \frac{\sin(\Lambda(+f_0) \frac{N}{2} )}{\sin(\frac{\Lambda(+f_0)}{2})} \right) \end{eqnarray} $$According to the shift theorem, the above equation holds two components - one cenered at $f_0$, another centered at $-f_0$, each with a main lobe and an infinite number of sidelobes:
Magnitude Plot
The plot below visualizes the result, with a two main lobes and the decaying side lobes:
In many cases, the absolute values of DFT spectra will be shown only for the positive frequencies. This representation is used in most examples in the following sections. The sine wave becomes a single peak at its frequency:
For a DFT with $N$ points and a sampling rate $f_s$, the DFT bins $b[k]$ - or support points - are located at the following frequencies:
$$ b[k] = k \frac{f_s}{N} $$Only for harmonic signals located at these exact frequencies, the DFT results can be expressed by the Dirac-Delta function.
This website visualizes the DFT very nicely for a better understanding: https://jackschaedler.github.io/circles-sines-signals/dft_walkthrough.html