1 A NOTE ABOUT WINDOWS
diagnose-sound-playing
2 Sound Control
play
stop
ding
3 Sound I/ O
rs-read
rs-read/ clip
rs-read-frames
rs-read-sample-rate
rs-write
4 Rsound Manipulation
rsound
rs-frames
rs-equal?
silence
rs-ith/ left
rs-ith/ right
clip
rs-append*
rs-overlay
rs-overlay*
assemble
rs-scale
5 Signals and Networks
sine-wave
sawtooth-wave
square-wave
dc-signal
signal->rsound
signals->rsound
signal-play
fader
signal-+ s
signal-*s
rsound->signal/ left
rsound->signal/ right
thresh/ signal
clip&volume
thresh
signal?
6 Visualizing Rsounds
rs-draw
rsound-fft-draw
vector-pair-draw/ magnitude
vector-draw/ real/ imag
7 RSound Utilities
make-harm3tone
make-tone
rsound-fft/ left
rsound-fft/ right
midi-note-num->pitch
fir-filter
iir-filter
8 Frequency Response
response-plot
poles&zeros->fun
9 Filtering
lpf/ dynamic
10 Single-cycle sounds
synth-note
synth-note/ raw
11 Stream-based Playing
play/ s
play/ s/ f
current-time/ s
12 Sample Code
13 Reporting Bugs

RSound: An Adequate Sound Engine for Racket

John Clements <clements@racket-lang.org>

 (require (planet clements/rsound:4:=2))
This collection provides a means to represent, read, write, play, and manipulate sounds. It depends on the clements/portaudio package to provide bindings to the cross-platform ‘PortAudio’ library which appears to run on Linux, Mac, and Windows.
It represents all sounds internally as stereo 16-bit PCM, with all the attendant advantages (speed, mostly) and disadvantages (clipping).

Does it work on your machine? Try this example:
(require (planet clements/rsound))
 
(play ding)

A note about volume: be careful not to damage your hearing, please. To take a simple example, the sine-wave function generates a sine wave with amplitude 1.0. That translates into the loudest possible sine wave that can be represented. So please set your volume low, and be careful with the headphones. Maybe there should be a parameter that controls the clipping volume. Hmm.

1 A NOTE ABOUT WINDOWS

Windows is a bit of a pain for developers. If you’re having trouble hearing sounds under windows (high latency, or "Invalid Device" errors), try running diagnose-sound-playing.

procedure

(diagnose-sound-playing)  void?

Tries playing a short tone using all of the available APIs and several plausible sample rates. It tries to offer a helpful message, along with the test.

2 Sound Control

These procedures start and stop playing sounds and loops.

procedure

(play rsound)  void?

  rsound : rsound?
Plays an rsound. Plays concurrently with an already-playing sound, if there is one.

procedure

(stop)  void

Stop all of the the currently playing sounds.

value

ding : rsound?

A one-second "ding" sound. Nice for testing whether sound playing is working.

3 Sound I/O

These procedures read and write rsounds from/to disk.

The RSound library reads and writes WAV files only; this means fewer FFI dependencies (the reading & writing is done in Racket), and works on all platforms.

procedure

(rs-read path)  rsound?

  path : path-string?
Reads a WAV file from the given path, returns it as an rsound.

It currently has lots of restrictions (it insists on 16-bit PCM encoding, for instance), but deals with a number of common bizarre conventions that certain WAV files have (PAD chunks, extra blank bytes at the end of the fmt chunk, etc.), and tries to fail relatively gracefully on files it can’t handle.

Reading in a large sound can result in a very large value (~10 Megabytes per minute); for larger sounds, consider reading in only a part of the file, using rs-read/clip.

procedure

(rs-read/clip path start finish)  rsound?

  path : path-string?
  start : nonnegative-integer?
  finish : nonnegative-integer?
Reads a portion of a WAV file from a given path, starting at frame start and ending at frame finish.

It currently has lots of restrictions (it insists on 16-bit PCM encoding, for instance), but deals with a number of common bizarre conventions that certain WAV files have (PAD chunks, extra blank bytes at the end of the fmt chunk, etc.), and tries to fail relatively gracefully on files it can’t handle.

procedure

(rs-read-frames path)  nonnegative-integer?

  path : path-string?
Returns the number of frames in the sound indicated by the path. It parses the header only, and is therefore much faster than reading in the whole sound.

The file must be encoded as a WAV file readable with rsound-read.

procedure

(rs-read-sample-rate path)  number?

  path : path-string?
Returns the sample-rate of the sound indicated by the path. It parses the header only, and is therefore much faster than reading in the whole sound.

The file must be encoded as a WAV file readable with rs-read.

procedure

(rs-write rsound path)  void?

  rsound : rsound?
  path : path-string?
Writes an rsound to a WAV file, using stereo 16-bit PCM encoding. It overwrites an existing file at the given path, if one exists.

4 Rsound Manipulation

These procedures allow the creation, analysis, and manipulation of rsounds.

struct

(struct rsound (data sample-rate)
  #:extra-constructor-name make-rsound)
  data : s16vector?
  sample-rate : nonnegative-number?
Represents a sound.

procedure

(rs-frames sound)  nonnegative-integer?

  sound : rsound?
Returns the length of a sound, in frames.

procedure

(rs-equal? sound1 sound2)  boolean?

  sound1 : rsound?
  sound2 : rsound?
Returns #true when the two sounds are (extensionally) equal.

This procedure is necessary because s16vectors don’t natively support equal?.

procedure

(silence frames)  rsound?

  frames : nonnegative-integer?
Returns an rsound of length frames containing silence. This procedure is relatively fast.

procedure

(rs-ith/left rsound frame)  nonnegative-integer?

  rsound : rsound?
  frame : nonnegative-integer?
Returns the nth sample from the left channel of the rsound, represented as a number in the range -1.0 to 1.0.

procedure

(rs-ith/right rsound frame)  nonnegative-integer?

  rsound : rsound?
  frame : nonnegative-integer?
Returns the nth sample from the right channel of the rsound, represented as a number in the range -1.0 to 1.0.

procedure

(clip rsound start finish)  rsound?

  rsound : rsound?
  start : nonnegative-integer?
  finish : nonnegative-integer?
Returns a new rsound containing the frames in rsound from the startth to the finishth - 1. This procedure copies the required portion of the sound.

procedure

(rs-append* rsounds)  rsound?

  rsounds : (listof rsound?)
Returns a new rsound containing the given rsounds, appended sequentially. This procedure is relatively fast. All of the given rsounds must have the same sample-rate.

procedure

(rs-overlay rsound-1 rsound-2)  rsound?

  rsound-1 : rsound?
  rsound-2 : rsound?
Returns a new rsound containing the two sounds played simultaneously. Note that unless both sounds have amplitudes less that 0.5, clipping or wrapping is likely.

procedure

(rs-overlay* rsounds)  rsound?

  rsounds : (listof rsound?)
Returns a new rsound containing all of the sounds played simultaneously. Note that unless all of the sounds have low amplitudes, clipping or wrapping is likely.

procedure

(assemble assembly-list)  rsound?

  assembly-list : (listof (list/c rsound? nonnegative-integer?))
Returns a new rsound containing all of the given rsounds. Each sound begins at the frame number indicated by its associated offset. The rsound will be exactly the length required to contain all of the given sounds.

So, suppose we have two rsounds: one called ’a’, of length 20000, and one called ’b’, of length 10000. Evaluating

(rs-overlay* (list (list a 5000)
                       (list b 0)
                       (list b 11000)))

... would produce a sound of 21000 frames, where each instance of ’b’ overlaps with the central instance of ’a’.

procedure

(rs-scale scalar rsound)  rsound?

  scalar : nonnegative-number?
  rsound : rsound?
Scale the given sound by multiplying all of its samples by the given scalar.

5 Signals and Networks

For signal processing, RSound adopts a dataflow-like paradigm. Networks represent interconnected signal-processing nodes, and produce streams of values. They can be connected together using a number of primitives, including the network syntactic form. Networks that have no inputs are called signals.

Here’s a trivial signal:

(network ()
         [out (+ 1 2)])

This is the signal that always produces 3.

Here’s another one, that counts upward:

(define counter/sig
  (network ()
           [counter (+ 1 (prev counter))]))

The prev form is special, and is used to refer to the prior value of the signal component.

Here’s another example, that adds together two sine waves, at 34 Hz and 46 Hz, assuming a sample rate of 44.1KHz:

(define sum-of-sines
  (network ()
           [a (sine-wave 34)]
           [b (sine-wave 46)]
           [out (+ a b)]))

Several things to note:
  • a network can have many clauses; each clause contains a name and a right-hand-side.

  • a right-hand-side must be an application, either of a primitive function or of a network.

  • the last clause is used as the output, regardless of its name.

  • clauses can produce multiple values; in this case, the name is replaced by a parenthesized list.

A clause may also have an optional #:init clause, specifying its initial value. This is important when a clause occurs in a prev clause.

In order to use a signal with signal-play, it should produce a real number in the range -1.0 to 1.0.

Here’s an example that uses one sine-wave (often called an "LFO") to control the pitch of another one:

(define vibrato-tone
  (network ()
           [lfo (sine-wave 2)]
           [sin (sine-wave (+ 400 (* 50 lfo)))]
           [out (* 0.1 sin)]))
(signal-play vibrato-tone)
(sleep 5)
(stop)

There are many built-in signals. Note that these are documented as though they were procedures, but they’re not; they can be used in a procedure-like way in network clauses. Otherwise, they will behave as opaque values; you can pass them to various signal functions, etc.

Also note that all of these assume a fixed sample rate of 44.1 KHz.

signal

(sine-wave frequency)  real?

  frequency : nonnegative-number?
Produces a signal representing a sine wave of the given frequency, at the default sample rate, of amplitude 1.0.

signal

(sawtooth-wave frequency)  real?

  frequency : nonnegative-number?
Produces a signal representing a naive sawtooth wave of the given frequency, of amplitude 1.0. Note that since this is a simple -1.0 up to 1.0 sawtooth wave, it’s got horrible aliasing all over the spectrum.

signal

(square-wave frequency)  real?

  frequency : nonnegative-number?
Produces a signal representing a naive square wave of the given frequency, of amplitude 1.0, at the default sample rate. It alternates between 1.0 and 0.0, which makes it more useful in, e.g., gating applications.

Also note that since this is a simple 1/-1 square wave, it’s got horrible aliasing all over the spectrum.

signal

(dc-signal amplitude)  real?

  amplitude : real?
Produces a constant signal at amplitude. Inaudible unless used to multiply by another signal.

In order to listen to them, you can transform them into rsounds, or play them directly:

procedure

(signal->rsound frames signal)  rsound?

  frames : nonnegative-integer?
  signal : signal?
Builds a sound of length frames at the default sample-rate by calling signal with integers from 0 up to frames-1. The result should be an inexact number in the range -1.0 to 1.0. Values outside this range are clipped. Both channels are identical.

Here’s an example of using it:

(define sig1
  (network ()
           [a (sine-wave 560)]
           [out (* 0.1 a)]))
 
(define r (signal->rsound 44100 sig1))
 
(play r)

procedure

(signals->rsound frames left-sig right-sig)  rsound?

  frames : nonnegative-integer?
  left-sig : signal?
  right-sig : signal?
Builds a stereo sound of length frames by using left-sig and right-sig to generate the samples for the left and right channels.

with integers from 0 up to frames-1. The result should be an inexact number in the range -1.0 to 1.0. Values outside this range are clipped.

procedure

(signal-play signal)  void?

  signal : signal?
Plays a (single-channel) signal. Halt playback using (stop).

procedure

(fader fade-samples)  signal?

  fade-samples : number?
Produces a signal that decays exponentially. After fade-samples, its value is 0.001. Inaudible unless used to multiply by another signal.

There are also a number of functions that combine existing signals, called "signal combinators":

procedure

(signal-+s signals)  signal?

  signals : (listof signal?)
Produces the signal that is the sum of the input signals.

procedure

(signal-*s signals)  signal?

  signals : (listof signal?)
Produces the signal that is the product of the input signals.

We can turn an rsound back into a signal, using rsound->signal:

procedure

(rsound->signal/left rsound)  signal?

  rsound : rsound?
Produces the signal that corresponds to the rsound’s left channel, followed by endless silence. Ah, endless silence.

procedure

(rsound->signal/right rsound)  signal?

  rsound : rsound?
Produces the signal that corresponds to the rsound’s right channel, followed by endless silence. (The silence joke wouldn’t be funny if I made it again.)

procedure

(thresh/signal threshold signal)  signal?

  threshold : real-number?
  signal : signal?
Applies a threshold (see thresh, below) to a signal.

procedure

(clip&volume volume signal)  signal?

  volume : real-number?
  signal : signal?
Clips the signal to a threshold of 1, then multiplies by the given volume.

Where should these go?

procedure

(thresh threshold input)  real-number?

  threshold : real-number?
  input : real-number?
Produces the number in the range (- threshold) to threshold that is closest to input. Put differently, it “clips” the input at the threshold.

Finally, here’s a predicate. This could be a full-on contract, but I’m afraid of the overhead.

procedure

(signal? maybe-signal)  boolean?

  maybe-signal : any/c
Is the given value a signal? More precisely, is the given value a procedure whose arity includes 1?

6 Visualizing Rsounds

 (require (planet clements/rsound:4:=2/draw))

procedure

(rs-draw rsound    
  #:title title    
  [#:width width    
  #:height height])  void?
  rsound : rsound?
  title : string?
  width : nonnegative-integer? = 800
  height : nonnegative-integer? = 200
Displays a new window containing a visual representation of the sound as a waveform.

procedure

(rsound-fft-draw rsound    
  #:zoom-freq zoom-freq    
  #:title title    
  [#:width width    
  #:height height])  void?
  rsound : rsound?
  zoom-freq : nonnegative-real?
  title : string?
  width : nonnegative-integer? = 800
  height : nonnegative-integer? = 200
Draws an fft of the sound by breaking it into windows of 2048 samples and performing an FFT on each. Each fft is represented as a column of gray rectangles, where darker grays indicate more of the given frequency band.

procedure

(vector-pair-draw/magnitude left    
  right    
  #:title title    
  [#:width width    
  #:height height])  void?
  left : (vectorof complex?)
  right : (vectorof complex?)
  title : string?
  width : nonnegative-integer? = 800
  height : nonnegative-integer? = 200
Displays a new window containing a visual representation of the two vectors’ magnitudes as a waveform. The lines connecting the dots are really somewhat inappropriate in the frequency domain, but they aid visibility....

procedure

(vector-draw/real/imag vec    
  #:title title    
  [#:width width    
  #:height height])  void?
  vec : (vectorof complex?)
  title : string?
  width : nonnegative-integer? = 800
  height : nonnegative-integer? = 200
Displays a new window containing a visual representation of the vector’s real and imaginary parts as a waveform.

7 RSound Utilities

procedure

(make-harm3tone frequency    
  volume?    
  frames    
  sample-rate)  rsound?
  frequency : nonnegative-number?
  volume? : nonnegative-number?
  frames : nonnegative-integer?
  sample-rate : nonnegative-number?
Produces an rsound containing a semi-percussive tone of the given frequency, frames, and volume. The tone contains the first three harmonics of the specified frequency. This function is memoized, so that subsequent calls with the same parameters will return existing values, rather than recomputing them each time.

procedure

(make-tone pitch volume duration)  rsound?

  pitch : nonnegative-number?
  volume : nonnegative-number?
  duration : nonnegative-exact-integer?
given a pitch in Hz, a volume between 0.0 and 1.0, and a duration in frames, return the rsound consisting of a pure sine wave tone using the specified parameters.

procedure

(rsound-fft/left rsound)  (vectorof complex?)

  rsound : rsound?
Produces the complex-valued vector that represents the fourier transform of the rsound’s left channel. Since the FFT takes time N*log(N) in the size of the input, running this on rsounds with more than a few thousand frames is probably going to be slow, unless the number of frames is a power of 2.

procedure

(rsound-fft/right rsound)  (vectorof complex?)

  rsound : rsound?
Produces the complex-valued vector that represents the fourier transform of the rsound’s right channel. Since the FFT takes time N*log(N) in the size of the input, running this on rsounds with more than a few thousand frames is probably going to be slow, unless the number of frames is a power of 2

procedure

(midi-note-num->pitch note-num)  number?

  note-num : nonnegative-integer?
Returns the frequency (in Hz) that corresponds to a given midi note number. Here’s the top-secret formula: 440*2^((n-69)/12).

procedure

(fir-filter delay-lines)  procedure?

  delay-lines : (listof (list/c nonnegative-exact-integer? real-number?))
Given a list of delay times (in frames) and amplitudes for each, produces a function that maps signals to new signals where each frame is the sum of the current signal frame and the multiplied versions of the delayed input signals (that’s what makes it FIR).

So, for instance,

(fir-filter (list (list 13 0.4) (list 4 0.1)))

...would produce a filter that added the current frame to 4/10 of the input frame 13 frames ago and 1/10 of the input frame 4 frames ago.

procedure

(iir-filter delay-lines)  procedure?

  delay-lines : (listof (list/c nonnegative-exact-integer? real-number?))
Given a list of delay times (in frames) and amplitudes for each, produces a function that maps signals to new signals where each frame is the sum of the current signal frame and the multiplied versions of the delayed output signals (that’s what makes it IIR).

So, for instance,

(iir-filter (list (list 13 0.4) (list 4 0.1)))

...would produce a filter that added the current frame to 4/10 of the output frame 13 frames ago and 1/10 of the output frame 4 frames ago.

8 Frequency Response

 (require (planet clements/rsound:4:=2/frequency-response))
This module provides functions to allow the analysis of frequency response on filters specified either as transfer functions or as lists of poles and zeros. It assumes a sample rate of 44.1 Khz.

procedure

(response-plot poly dbrel min-freq max-freq)  void?

  poly : procedure?
  dbrel : real?
  min-freq : real?
  max-freq : real
Plot the frequency response of a filter, given its transfer function (a function mapping reals to reals). The dbrel number indicates how many decibels up the "zero" line should be shifted. The graph starts at min-freq Hz and goes up to max-freq Hz. Note that aliasing effects may affect the apparent height or depth of narrow spikes.

Here’s an example of calling this function on a 100-pole comb filter, showing the response from 10KHz to 11KHz:
(response-plot (lambda (z)
                 (/ 1 (- 1 (* 0.95 (expt z -100)))))
               30 10000 11000)

procedure

(poles&zeros->fun poles zeros)  procedure?

  poles : (listof real?)
  zeros : (listof real?)
given a list of poles and zeros in the complex plane, generate the corresponding transfer function.

Here’s an example of calling this function as part of a call to response-plot, for a filter with three poles and two zeros, from 0 Hz up to the nyquist frequency, 22.05 KHz:
(response-plot (poles&zeros->fun '(0.5 0.5+0.5i 0.5-0.5i) '(0+1i 0-1i))
               40
               0
               22050)

9 Filtering

 (require (planet clements/rsound:4:=2/filter))
This module provides a dynamic low-pass filter, among other things.

procedure

(lpf/dynamic control input)  signal?

  control : signal?
  input : signal?
The control signal must produce real numbers in the range 0.01 to 3.0. A small number produces a low cutoff frequency. The input signal is the audio signal to be processed. For instance, here’s a time-varying low-pass filtered sawtooth:

(define (control f) (+ 0.5 (* 0.2 (sin (* f 7.123792865282977e-05)))))
(define (sawtooth f) (/ (modulo f 220) 220))
 
(play (signal->rsound 88200 (lpf/dynamic control sawtooth)))

10 Single-cycle sounds

 (require (planet clements/rsound:4:=2/single-cycle))
This module provides support for generating tones from single-cycle waveforms.
In particular, it comes with a library of 247 such waveforms, courtesy of Adventure Kid’s website. Used with permission. Thanks!

procedure

(synth-note family    
  spec    
  midi-note-number    
  duration)  rsound
  family : string?
  spec : number-or-path?
  midi-note-number : natural?
  duration : natural?
Given a family (currently either "main", "vgame", or "path"), a spec (a number in the first two cases), a midi note number and a duration in frames, produces an rsound. There’s a (non-configurable) envelope applied, too.

Example, playing sound #49 from the vgame package for a half-second at middle C:

(synth-note "vgame" 49 60 22010)

procedure

(synth-note/raw family    
  spec    
  midi-note-number    
  duration)  rsound
  family : string?
  spec : number-or-path?
  midi-note-number : natural?
  duration : natural?
Same as above, but no envelope is applied.

11 Stream-based Playing

 (require (planet clements/rsound:4:=2/stream-play))
RSound now provides functions whereby all played sounds use a single stream. This has the advantage of lower latency and avoids problems on Windows, where opening a new stream for each sound causes errors.

procedure

(play/s sound)  void

  sound : rsound?
Plays a given sound.

procedure

(play/s/f sound frame)  void

  sound : rsound?
  frame : natural?
Plays a given sound at a given (stream-relative) frame.

procedure

(current-time/s)  natural?

Returns the current stream-relative frame.

12 Sample Code

An example of a signal that plays two lines, each with randomly changing square-wave tones. This one runs in the Intermediate student language:

(require (planet clements/rsound))
(require (planet clements/rsound/filter))
 
; scrobble: number number number -> signal
; return a signal that generates square-wave tones, changing
; at the given interval into a new randomly-chosen frequency
; between lo-f and hi-f
(define (scrobble change-interval lo-f hi-f)
  (local
    [(define freq-range (floor (- hi-f lo-f)))
     (define (maybe-change f l)
       (cond [(= l 0) (+ lo-f (random freq-range))]
             [else f]))]
    (network ()
             [looper ((loop-ctr change-interval 1))]
             [freq (maybe-change (prev freq 400) looper)]
             [a (square-wave freq)])))
 
(define my-signal
  (network ()
           [a ((scrobble 4000 200 600))]
           [b ((scrobble 40000 100 200))]
           [lpf-wave (sine-wave 0.1)]
           [c (lpf/dynamic (max 0.01 (abs (* 0.5 lpf-wave))) (+ a b))]
           [b (* c 0.1)]))
 
; write 20 seconds to a file, if uncommented:
; (rs-write (signal->rsound (* 20 44100) my-signal) "/tmp/foo.wav")
 
; play the signal
(signal-play my-signal)

13 Reporting Bugs

For Heaven’s sake, report lots of bugs!