Audio for websites has a very checkered past. Finally, however, we can forget about using media tags like “embed” & “object”, and browser plugins like flash, along with the annoying “bgsound” of IE. The HTML5 <audio> tag is a big step forward…. But the “Web Audio API”, modeled on a graph of “audio nodes” providing filters, gains, spectral analysis, and spatially-located sound sources, is more of a giant leap forward for sounds in games and online music synthesis. That, along with “getUserMedia” to capture real-time camera and microphone input are arriving “as we speak”. Plan on lots of eye- (and ear-) candy to whet your appetite, with a modest taste of geeky codes and advances in Javascript Arrays and XHR2.
1. A (Mis-) Guided Tour of the “Web Audio API”
Edward B. Rockower, Ph.D.
Presented 10/15/14
Monterey Bay Information Technologists (MBIT) Meetup
ed@rockower.net 1
2. Abstract
•
Audio for websites has a very checkered past.
•
The HTML5 <audio> tag is a big step forward
•
“Web Audio API”, more of a giant Leap
–
modeled on a modular graph of “audio nodes”
–
provides filters, gains, convolvers, spectral analysis, and spatially-located sound sources
–
Very important for sounds in
•
games, online music synthesis, speech recognition, analyses
•
Javascript Arrays and XHR2 (AJAX)
•
“getUserMedia” to capture real-time camera and microphone input
•
arriving “as we speak” (Check Availability: www.CanIUse.com) 2
3. Organizing Principles (Evolutionary Revolutionary) 3
Events
Asynchronous/callbacks
Web Workers
Theremin
(artificial)
Transistor Printed circuit FFT A/D PCM, DSP
Moog
Audio Synthesizer
Internet
Browser Wars
Human/computer interaction
Online & Games
Natural
(birds, voices)
Computer
Generated
demos
Enabling
Technologies
Music Audio Engineering
Web Audio API
4. What’s New in AJAX, HTML5, & Javascript
•
New XHR2 (arraybuffer, typed arrays, CORS)
•
Asynchronous (callbacks, non-blocking, events)
•
Audio Threads (Audio Worker)
•
getUserMedia (HTML5, WebRTC)
•
requestAnimationFrame (60 fps, HTML5)
•
<audio> (HTML5 mediaElement)
•
Web Audio API (optimized ‘native’ C code, modular audio graph, Fast Fourier Transform)
•
Vendor prefixed syntax (webkitAudioContext)
•
Firefox v. 32 Dev Tools displays/edits Web Audio graph 4
6. PCM Digitization Analog to Digital (A/D)
•
4 bits 2^4 = 16 different values
–
Quantization of values
–
Encode as binary numbers
•
Ts = Sampling interval
•
1/ Ts = Sampling Frequency
•
44.1 kHz used in Compact discs
–
Nyquist Freq. = 44.1kHz/2 = upper limit of hearing 6
7. Buffers and views: “typed array” architecture
JavaScript typed arrays split implementation into buffers and views.
•
“buffer” (ArrayBuffer object) represents a chunk of data;
•
no format to speak of
•
no mechanism for accessing its contents
•
need to use a “view”, provides context, i.e. data type, starting offset, number of elements
•
Not your standard Arrays, but Fast !!
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays 7
16 bytes = 8 * 16 bits = 128 bits
8. Buffers, Arrays, XHR, … 8
•
XMLHttpRequest (XHR) request.responseType = 'arraybuffer';
•
audioContext.decodeAudioData(request.response, function (anotherBuffer) { … }
•
// Create the array for the data values
•
frequencyArray = new Uint8Array(analyserNode.frequencyBinCount);
•
analyserNode.getByteFrequencyData(frequencyArray); Fast Fourier Transform (FFT) i.e. Spectrum. FFT populates frequencyArray
•
requestAnimationFrame plots data at each “frame” redraw (60 fps)
more efficient than setTimeout( ) or setInterval( ) ( here 8 bits is quantization in the value of each measurement/sample ‘frame’, Whereas the inverse of Sampling rate, e.g. 1/22,050 = ~4.5 ms is the quantization in time.)
9. Leon Theremin 9
http://mdn.github.io/violent-theremin/
http://youtu.be/w5qf9O6c20o
12. Bourne Identity: Sound Engineers (explaining how car sounds are modified to be more exciting) 12
13. Audio Graph Setup: Typical Workflow 13
1.Create audio context
2.Inside the context, create sources, e.g. <audio>, oscillator, stream
3.Create effects nodes, e.g. reverb, biquad filter, panner, compressor
4.Choose final destination of audio, for example your system speakers
5.Connect the sources up to the effects, and the effects to the detination.
developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API
14. Polyfills – Vendor-prefixed (webkit, moz, ms) (e.g. using a self-executing-function)
(function() {
// Polyfill for AudioContext
window.AudioContext = window.AudioContext || window.webkitAudioContext || window.mozAudioContext;
// Polyfill for requestAnimationFrame (replaces setTimeout)
var requestAnimationFrame = window.requestAnimationFrame || window.mozRequestAnimationFrame || window.webkitRequestAnimationFrame || window.msRequestAnimationFrame; window.requestAnimationFrame = requestAnimationFrame;
})(); 14
16. Draw the AudioBuffer (no audiograph) 16
var audioContext = new AudioContext();
function initAudio() {
var audioRequest = new XMLHttpRequest();
audioRequest.open("GET", "sounds/myAudio.ogg", true);
audioRequest.responseType = "arraybuffer";
audioRequest.onload = function() {
audioContext.decodeAudioData( audioRequest.response, function(buffer) {
var canvas = document.getElementById("view1");
drawBuffer( canvas.width, canvas.height, canvas.getContext('2d'), buffer );
} );
}
audioRequest.send();
}
function drawBuffer( width, height, context, buffer ) {
var data = buffer.getChannelData( 0 );
var step = Math.ceil( data.length / width );
var amp = height / 2;
for(var i=0; i < width; i++){
var min = 1.0;
var max = -1.0;
for (var j=0; j<step; j++) {
var datum = data[(i*step)+j];
if (datum < min)
min = datum;
if (datum > max)
max = datum;
}
context.fillRect(i,(1+min)*amp,1,Math.max(1,(max-min)*amp));
}
}
Draws a Web Audio AudioBuffer to a canvas
https://github.com/cwilso/Audio-Buffer-Draw/commits/master
17. Plot Audio Spectrum
var audioEl = document.querySelector('audio'); // <audio>
var audioCtx = new AudioContext();
var canvasEl = document.querySelector('canvas'); // <canvas>
var canvasCtx = canvasEl.getContext('2d');
var mySource = audioCtx.createMediaElementSource(audioEl); // create source
var myAnalyser = audioCtx.createAnalyser(); // create analyser
mySource.connect(analyser); // connect audio nodes
myAnalyser.connect(audioCtx.destination); // connect to speakers
function processIt() {
var freqData = new Uint8Array(myAnalyser.frequencyBinCount);
myAnalyser.getByteFrequencyData(freqData); // place spectrum in freqData
requestAnimationFrame(function() {
canvasCtx.clearRect(0, 0, canvasEl.width, canvasEl.height);
canvasCtx.fillStyle = "#ff0000";
for (var i = 0; i < freqData.length; i++) {
canvasCtx.fillRect(i, canvasEl.height, 1, canvasEl.height - freqData[i]); // plot frequency spectrum
} // end for
}); // end requestAnimationFrame
} // end fcn processIt
setInterval(processIt, 1000/60);
17
18. Plot Audio Spectrogram* 18
var audioEl = document.querySelector('audio'); // <audio>
var audioCtx = new AudioContext();
var canvasEl = document.querySelector('canvas'); // <canvas>
var canvasCtx = canvasEl.getContext('2d');
var mySource = audioCtx.createMediaElementSource(audioEl);
var myAnalyser = audioCtx.createAnalyser();
myAnalyser.smoothingTimeConstant = 0;
var myScriptProcessor = audioCtx.createScriptProcessor(myAnalyser.frequencyBinCount, 1, 1);
mySource.connect(myAnalyser);
myAnalyser.connect(audioCtx.destination); // speakers/headphone
myScriptProcessor.connect(audioCtx.destination);
var x = 0;
myScriptProcessor.onaudioprocess = function () {
if(!audioEl.paused) {
x += 1;
var freqData = new Uint8Array(myAnalyser.frequencyBinCount);
myAnalyser.getByteFrequencyData(freqData);
requestAnimationFrame(function() {
if (x > canvasEl.width) {
canvasCtx.clearRect(0, 0, canvasEl.width, canvasEl.height);
x = 0;
}
for (var i = 0; i < freqData.length; i++) {
canvasCtx.fillStyle = "hsl(" + freqData[i] + ",100%, 50%)";
canvasCtx.fillRect(x, canvasEl.height - i, 1, 1);
} // end for
}); // end requestAnimationFrame
} // end if
} // end onaudioprocess
*plot of the spectrum as a function of time
Time
Frequency
19. Types of Audio Nodes 19
•
Source
•
<audio> Element
•
Buffer Source (use with XHR)
•
Oscillator
•
Analyser Node
•
Panner
•
Doppler Shift (cf voice changer)?
•
http://chromium.googlecode.com/svn/trunk/samples/audio/doppler.html
•
Script Processor/AudioWorker (e.g. add your own higher resolution FFT)
•
Compressor (e.g. avoid ‘clipping’)
•
Convolution (e.g. add impulse response of large cathedral)
•
Delay
•
…
21. A Fluid Specification
•
http://webaudio.github.io/web-audio-api for latest
•
Updated frequently: W3C Editor's Draft 14 October 2014
–
August 29th + …
–
September 29th + …
–
October 5th, 8th, 14th
•
Boris Smus web book with syntax changes
–
http://chimera.labs.oreilly.com/books/1234000001552
•
Script Processor Node is deprecated, use createAudioWorker
•
“AudioProcessingEvent” (deprecated) is dispatched to ScriptProcessorNode. When the ScriptProcessorNode is replaced by AudioWorker, we’ll use AudioProcessEvent. 21
22. Boris Smus Book, Deprecations (http://chimera.labs.oreilly.com/books/1234000001552/apa.html)
•
AudioBufferSourceNode.noteOn() has been changed to start().
•
AudioBufferSourceNode.noteGrainOn() has been changed to start().
•
AudioBufferSourceNode.noteOff() has been changed to stop().
•
AudioContext.createGainNode() has been changed to createGain().
•
AudioContext.createDelayNode() has been changed to createDelay().
•
AudioContext.createJavaScriptNode() has been changed to createScriptProcessor(). (changing to Audio Workers )
•
OscillatorNode.noteOn() has been changed to start().
•
OscillatorNode.noteOff() has been changed to stop(). 22
23. Firefox Web Audio Editor
https://developer.mozilla.org/en-US/docs/Tools/Web_Audio_Editor
Activate
Web Audio Editor
24. Firefox Web Audio Editor (cont.) 24
•
Click F12 or Ctrl-Shift-K Show Developer Tools
•
Select “Web Audio” tab Oscillator Node AudioParams
•
Edit AudioParams
•
Update Audio Graph (and Sound!) in real time
25. Demos
•
http://borismus.github.io/spectrogram Realtime, “getUserMedia”
•
http://webaudioapi.com Boris Smus
•
https://webaudiodemos.appspot.com Chris Wilson
•
https://webaudiodemos.appspot.com/Vocoder
•
https://webaudiodemos.appspot.com/slides/mediademo
•
http://chromium.googlecode.com/svn/trunk/samples/audio/doppler.html
•
http://chromium.googlecode.com/svn/trunk/samples/audio/ (shows you files, can view sources)
•http://labs.dinahmoe.com/ToneCraft
•Localhost Demos C:UsersrockowerDropboxAudioMBIT- WebAudioTalkdemosstartPythonServer.bat 25
@echo off
rem start Python3 Web Server in demos folder
call python -m http.server 80
29. Impulse Response, Convolution, Spatialization, …
•
*http://www.openairlib.net
•
http://www.openairlib.net/auralizationdb/content/r1-nuclear-reactor-hall
–
Upload a sound to hear in that space .wav < 5Megs
–
Or download “impulse response” to convolve with your sound 29
Boris Smus says (in his O’Reilly book):
–
Room Effects: ‘The convolver node “smushes” the input sound and its impulse response* by computing a convolution, a mathematically intensive function. The result is something that sounds as if it was produced in the room where the impulse response was recorded.’
–
Spatialized Sounds: the Web Audio API comes with built-in positional audio features
–
Position and orientation of sources and listeners
–
Parameters associated with the source audio cones
–
Relative velocities of sources and listeners (Doppler Shift)
30. References/links
•
http://webaudio.github.io/web-audio-api latest specification
•
http://webaudioapi.com/ Boris Smus site
•
http://chimera.labs.oreilly.com/books/1234000001552 “Web Audio API” book online
•
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays
•
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer
•
http://www.html5rocks.com/en/tutorials/webaudio/intro/ (Smus)
•
https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Using_XMLHttpRequest
•
http://webaudiodemos.appspot.com/ Chris Wilson
•
http://webaudioplayground.appspot.com create ‘audio graph’, include analyser, gain, filter, delay
•
http://www.html5rocks.com/en/tutorials/file/xhr2/ Bidelman tutorial
•
Book “Javascript Creativity” Shane Hudson, Apress, chapter 3, etc.
•
https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Using_Web_Audio_API
Caveat: many Audio websites have outdated, i.e. non-working, syntax for AudioContext &/or Audio Nodes; some are “vendor-prefixed” e.g. webkitCreateAudioContext
(as well as for requestAnimationFrame) 30
32. To make it as an audio engineer, you MUST know:
•
Digital audio
•
The ins and outs of signal flow and patch bays
•
How analog consoles work
•
In-depth study of analog consoles
•
Audio processing
•
Available audio plugins and how they work
•
Signal processing and compressors
•
How to perform a professional mix-down
•
How various studios are designed and how their monitors work
•
Electronic music and beat matching
•
Sync and automation
•
Recording and mixing ins and outs
•
Surround mixing 32
http://www.recordingconnection.com/courses/audio-engineering
33. What is a “biquad” filter?
•
a digital biquad filter is a second-order recursive linear filter,
•
containing two poles and two zeros.
•
"Biquad" is an abbreviation of "biquadratic", i.e. in the Z domain,
its transfer function is the ratio of two quadratic functions
33
34. Uint8Array(k) has k samples where each ‘sample’ is a quantized measurement or computed value with 8 bits per value 34
•
Analog signal is sampled every TS secs.
•
Ts is referred to as the sampling interval.
•
fs = 1/Ts is called the sampling rate or sampling frequency.
35. Abstract of Presentation
Audio for websites has a very checkered past. Finally, however, we can forget about using media tags like “embed” & “object”, and browser plugins like flash, along with the annoying “bgsound” of IE. The HTML5 <audio> tag is a big step forward…. But the “Web Audio API”, modeled on a graph of “audio nodes” providing filters, gains, spectral analysis, and spatially-located sound sources, is more of a giant leap forward for sounds in games and online music synthesis. That, along with “getUserMedia” to capture real-time camera and microphone input are arriving “as we speak”. Plan on lots of eye- (and ear-) candy to whet your appetite, with a modest taste of geeky codes and advances in Javascript Arrays and XHR2. 35
36. General audio graph definition
•
General containers and definitions that shape audio graphs in Web Audio API usage.
•
AudioContext: represents an audio-processing graph built from audio modules linked together, each represented by an AudioNode. An audio context controls the creation of the nodes it contains and the execution of the audio processing, or decoding. You need to create an AudioContext before you do anything else, as everything happens inside a context.
•
AudioNode: interface represents an audio-processing module like an audio source (e.g. an HTML <audio> or <video> element), audio destination, intermediate processing module (e.g. a filter like BiquadFilterNode, or volume control like GainNode).
•
AudioParam: interface represents an audio-related parameter, like one of an AudioNode. It can be set to a specific value or a change in value, and can be scheduled to happen at a specific time and following a specific pattern.
•
ended (event): fired when playback has stopped because the end of the media was reached. 36
37. Interfaces defining audio sources
•
OscillatorNode: represents a sine wave. It is an AudioNode audio- processing module that causes a given frequency of sine wave to be created.
•
AudioBuffer: represents a short audio asset residing in memory, created from an audio file using the AudioContext.decodeAudioData() method, or created with raw data using AudioContext.createBuffer(). Once decoded into this form, the audio can then be put into an AudioBufferSourceNode.
•
AudioBufferSourceNode: represents an audio source consisting of in- memory audio data, stored in an AudioBuffer. It is an AudioNode that acts as an audio source.
•
MediaElementAudioSourceNode: represents an audio source consisting of an HTML5 <audio> or <video> element. It is an AudioNode that acts as an audio source.
•
MediaStreamAudioSourceNode: represents an audio source consisting of a WebRTC MediaStream (such as a webcam or microphone.) It is an AudioNode that acts as an audio source. 37
38. Define effects you want to apply to audio sources.
•
BiquadFilterNode: represents a simple low-order filter, represents different kinds of filters, tone control devices or graphic equalizers.
•
ConvolverNode: performs a Linear Convolution on a given AudioBuffer, often used to achieve a reverb effect.
•
DelayNode: causes a delay between the arrival of an input data and its propagation to the output.
•
DynamicsCompressorNode: a compression effect, lowers volume of the loudest parts of the signal to help prevent clipping and distortion from multiple sounds played and multiplexed together
•
GainNode: represents a change in volume, causes a given gain to be applied to the input signal
•
WaveShaperNode: represents a non-linear distorter, uses a curve to apply a waveshaping distortion, often used to add a warm feeling
•
PeriodicWave: define a periodic waveform that can be used to shape the output of an OscillatorNode. 38
39. Audio Analysis, Spatialization & Destinations
•
AnalyserNode: represents a node able to provide real-time frequency and time-domain analysis, for data analysis and visualization.
•
audio spatialization panning effects to your audio sources.
–
AudioListener: represents the position and orientation of the unique person listening to the audio scene
–
PannerNode: represents the behavior of a signal in space, describing its position with right-hand Cartesian coordinates, its movement using a velocity vector and its directionality using a directionality cone.
•
AudioDestinationNode: represents the end destination of an audio source in a given context — usually the speakers of your device.
•
MediaStreamAudioDestinationNode: represents an audio destination consisting of a WebRTC MediaStream with a single AudioMediaStreamTrack
–
can be used in a similar way to a MediaStream obtained from Navigator.getUserMedia., acts as an audio destination. 39