RSound: An Adequate Sound Engine for Racket
This collection provides a means to represent, read,
write, play, and manipulate sounds. It uses the ’portaudio’ library, which appears
to run on Linux, Mac, and Windows.
The package contains binary versions of the Mac & Windows portaudio libraries. This is
because Windows and Mac users are less likely to be able to install their own
versions of the library; naturally, this is a less-than-perfect solution. In particular,
it appears that Windows users often get an error message about a missing DLL that can
be solved by installing a separate bundle... from Microsoft?
Sound playing happens on a separate racket thread and custodian. This means
that re-running the program or interrupting with a "Kill" will not halt the
sound. (Use
(stop-playing) for that.)
It represents all sounds internally as stereo 16-bit PCM, with all the attendant
advantages (speed, mostly) and disadvantages (clipping).
Does it work on your machine? Try this example (and accept my
apologies if I forget to update the version number):
A note about volume: be careful not to damage your hearing, please. To take a simple example,
the sine-wave function generates a sine wave with amplitude 1.0. That translates into
the loudest possible sine wave that can be represented. So please set your volume low,
and be careful with the headphones. Maybe there should be a parameter that controls the clipping
volume. Hmm.
1 Sound Control
These procedures start and stop playing sounds and loops.
Plays an rsound. Interrupts an already-playing sound, if there is one.
Plays an rsound repeatedly. Continues looping until interrupted by
another sound command.
When the current sound or loop finishes, starts looping this one instead.
Stop the currently playing sound.
2 Sound I/O
These procedures read and write rsounds from/to disk.
The RSound library reads and writes WAV files only; this means fewer FFI dependencies
(the reading & writing is done in racket), and works on all platforms.
Reads a WAV file from the given path, returns it as an rsound.
It currently
has lots of restrictions (it insists on 16-bit PCM encoding, for instance), but deals
with a number of common bizarre conventions that certain WAV files have (PAD chunks,
extra blank bytes at the end of the fmt chunk, etc.), and tries to fail
relatively gracefully on files it can’t handle.
Reading in a large sound can result in a very large value (~10 Megabytes per minute);
for larger sounds, consider reading in only a part of the file, using rsound-read/clip.
Reads a portion of a WAV file from a given path, starting at frame start and ending at frame finish.
It currently
has lots of restrictions (it insists on 16-bit PCM encoding, for instance), but deals
with a number of common bizarre conventions that certain WAV files have (PAD chunks,
extra blank bytes at the end of the fmt chunk, etc.), and tries to fail
relatively gracefully on files it can’t handle.
(read-rsound-frames path) → nonnegative-integer? |
path : path-string? |
Returns the number of frames in the sound indicated by the path. It parses
the header only, and is therefore much faster than reading in the whole sound.
The file must be encoded as a WAV file readable with rsound-read.
Returns the sample-rate of the sound indicated by the path. It parses
the header only, and is therefore much faster than reading in the whole sound.
The file must be encoded as a WAV file readable with rsound-read.
Writes an rsound to a WAV file, using stereo 16-bit PCM encoding. It
overwrites an existing file at the given path, if one exists.
3 Rsound Manipulation
These procedures allow the creation, analysis, and manipulation of rsounds.
|
data : s16vector? |
frames : nonnegative-integer? |
sample-rate : nonnegative-number? |
Represents a sound.
(make-silence frames sample-rate) → rsound? |
frames : nonnegative-integer? |
sample-rate : nonnegative-number? |
Returns an rsound of length frames containing silence. This procedure is relatively fast.
Returns the nth sample from the left channel of the rsound, represented as a number in the range -1.0
to 1.0.
Returns the nth sample from the right channel of the rsound, represented as a number in the range -1.0
to 1.0.
Returns a new rsound containing the frames in rsound from the startth to the finishth - 1.
This procedure copies the required portion of the sound.
Returns a new rsound containing the given rsounds, appended sequentially. This procedure is relatively
fast. All of the given rsounds must have the same sample-rate.
Returns a new rsound containing all of the given rsounds. Each sound begins at the frame number
indicated by its associated offset. The rsound will be exactly the length required to contain all of
the given sounds.
So, suppose we have two rsounds: one called ’a’, of length 20000, and one called ’b’, of length 10000.
Evaluating
... would produce a sound of 21000 frames, where each instance of ’b’ overlaps with the central
instance of ’a’.
4 Signals
A signal is a function mapping a frame number to a real number in the range -1.0 to 1.0. There
are several built-in functions that produce signals.
(sine-wave frequency sample-rate) → signal? |
frequency : nonnegative-number? |
sample-rate : nonnegative-number? |
Produces a signal representing a sine wave of the given
frequency, of amplitude 1.0.
(sawtooth-wave frequency sample-rate) → signal? |
frequency : nonnegative-number? |
sample-rate : nonnegative-number? |
Produces a signal representing a naive sawtooth wave of the given
frequency, of amplitude 1.0. Note that since this is a simple -1.0 up to 1.0 sawtooth wave, it’s got horrible
aliasing all over the spectrum.
(square-wave frequency sample-rate) → signal? |
frequency : nonnegative-number? |
sample-rate : nonnegative-number? |
Produces a signal representing a naive square wave of the given
frequency, of amplitude 1.0. Note that since this is a simple 1/-1 square wave, it’s got horrible
aliasing all over the spectrum.
Produces a constant signal at amplitude. Inaudible unless used to multiply by
another signal.
In order to listen to them, you’ll need to transform them into rsounds:
Builds a sound of length frames and sample-rate sample-rate by calling
fun with integers from 0 up to frames-1. The result should be an inexact
number in the range -1.0 to 1.0. Values outside this range are clipped.
Both channels are identical.
Here’s an example of using it:
Alternatively, we could use sine-wave to achieve the same result:
|
frames : nonnegative-integer? |
sample-rate : nonnegative-integer? |
left-fun : signal? |
right-fun : signal? |
Builds a stereo sound of length frames and sample-rate sample-rate by calling
left-fun and right-fun
with integers from 0 up to frames-1. The result should be an inexact
number in the range -1.0 to 1.0. Values outside this range are clipped.
Produces a signal that decays exponentially. After fade-samples, its value is 0.001.
Inaudible unless used to multiply by another signal.
Produces a signal whose values are computed by calling proc with the current frame and the additional
values args.
So, for instance, if we defined the function flatline as
... then (signal flatline 0.4) would produce the same result as (dc-signal 0.4).
There are also a number of functions that combine existing signals, called "signal combinators":
Produces the signal that is the sum of the input signals.
Produces the signal that is the product of the input signals.
We can turn an rsound back into a signal, using rsound->signal:
Produces the signal that corresponds to the rsound’s left channel, followed by endless silence. Ah, endless silence.
Produces the signal that corresponds to the rsound’s right channel, followed by endless silence. (The silence joke
wouldn’t be funny if I made it again.)
Applies a threshold (see
thresh, below) to a signal.
Clips the signal to a threshold of 1, then multiplies by the given volume.
Where should these go?
(thresh threshold input) → real-number? |
threshold : real-number? |
input : real-number? |
Produces the number in the range
(- threshold) to
threshold that is
closest to
input. Put differently, it “clips” the input at the threshold.
Finally, here’s a predicate. This could be a full-on contract, but I’m afraid of the
overhead.
Is the given value a signal? More precisely, is the given value a procedure whose
arity includes 1?
5 Visualizing Rsounds
(rsound-draw | | rsound | | | | | | | #:title title | | | | | | [ | #:width width | | | | | | | #:height height]) | | → | | void? |
|
rsound : rsound? |
title : string? |
width : nonnegative-integer? = 800 |
height : nonnegative-integer? = 200 |
Displays a new window containing a visual representation of the sound as a waveform.
(rsound-fft-draw | | rsound | | | | | | | #:zoom-freq zoom-freq | | | | | | | #:title title | | | | | | [ | #:width width | | | | | | | #:height height]) | | → | | void? |
|
rsound : rsound? |
zoom-freq : nonnegative-real? |
title : string? |
width : nonnegative-integer? = 800 |
height : nonnegative-integer? = 200 |
Draws an fft of the sound by breaking it into windows of 2048 samples and performing an
FFT on each. Each fft is represented as a column of gray rectangles, where darker grays
indicate more of the given frequency band.
(vector-pair-draw/magnitude | | left | | | | | | | right | | | | | | | #:title title | | | | | | [ | #:width width | | | | | | | #:height height]) | | → | | void? |
|
left : (vectorof complex?) |
right : (vectorof complex?) |
title : string? |
width : nonnegative-integer? = 800 |
height : nonnegative-integer? = 200 |
Displays a new window containing a visual representation of the two vectors’ magnitudes
as a waveform. The lines connecting the dots are really somewhat inappropriate in the
frequency domain, but they aid visibility....
(vector-draw/real/imag | | vec | | | | | | | #:title title | | | | | | [ | #:width width | | | | | | | #:height height]) | | → | | void? |
|
vec : (vectorof complex?) |
title : string? |
width : nonnegative-integer? = 800 |
height : nonnegative-integer? = 200 |
Displays a new window containing a visual representation of the vector’s real and imaginary
parts as a waveform.
6 RSound Utilities
|
frequency : nonnegative-number? |
volume? : nonnegative-number? |
frames : nonnegative-integer? |
sample-rate : nonnegative-number? |
Produces an rsound containing a semi-percussive tone of the given frequency, frames, and volume. The tone contains the first
three harmonics of the specified frequency. This function is memoized, so that subsequent calls with the same parameters
will return existing values, rather than recomputing them each time.
Produces the complex-valued vector that represents the fourier transform of the rsound’s left channel.
Since the FFT takes time N*log(N) in the size of the input, running this on rsounds with more than a
few thousand frames is probably going to be slow, unless the number of frames is a power of 2.
Produces the complex-valued vector that represents the fourier transform of the rsound’s right channel.
Since the FFT takes time N*log(N) in the size of the input, running this on rsounds with more than a
few thousand frames is probably going to be slow, unless the number of frames is a power of 2
Returns the frequency (in Hz) that corresponds to a given midi note number. Here’s the top-secret formula:
440*2^((n-69)/12).
Given a list of delay times (in frames) and amplitudes for each, produces a function that maps signals
to new signals where each frame is the sum of the current signal frame and the multiplied versions of
the delayed input signals (that’s what makes it FIR).
So, for instance,
(fir-filter (list (list 13 0.4) (list 4 0.1)))
...would produce a filter that added the current frame to 4/10 of the input frame 13 frames ago and 1/10 of
the input frame 4 frames ago.
Given a list of delay times (in frames) and amplitudes for each, produces a function that maps signals
to new signals where each frame is the sum of the current signal frame and the multiplied versions of
the delayed output signals (that’s what makes it IIR).
So, for instance,
(iir-filter (list (list 13 0.4) (list 4 0.1)))
...would produce a filter that added the current frame to 4/10 of the output frame 13 frames ago and 1/10 of
the output frame 4 frames ago.
not-yet-documented: (provide twopi make-tone make-squaretone ding make-ding split-in-4 times vectors->rsound echo1 fft-complex-forward fft-complex-inverse)
7 Reporting Bugs
For Heaven’s sake, report lots of bugs!