It is now obvious that VR and AR are driving the development of new visual display devices as well as visually rich 360° content. Most people are not aware that virtual reality has breathed new life into two spatially rich audio formats that previously lacked mainstream application.
What makes VR audio different from stereo or even surround playback is the need for believable positioning of sonic content in a 360° space. Not only should sound “surround” us in a circle, but it also should be identifiable from locations above and below us. Fortunately, virtual reality audio formats that can meet this requirement have been around for at least 50 years in the form of Ambisonics for room playback through speakers and binaural audio for headphone playback. This story has a remarkable history and begins as far back as 1940.
Early Audio Formats: Sounds Fantastic
In the 1950s consumer audio still was primarily monophonic. Stereo playback did not become a standard until the early 1960s. However, multichannel audio playback was available for movie theaters much earlier. In particular, Walt Disney Studios developed what they called the “Fantasound” system for its animated feature film “Fantasia.” Fantasound was a multichannel playback format that allowed playback from 3 speakers in the front (left-center-right) and at least two in the rear, thus resembling a modern 5.1 surround system. Versions of this audio format (the number of speaker channels was sometimes fewer) traveled to a select number of theaters around the country in 1940-41 before being retired and in fact repurposed for wartime use. This is a fascinating vignette in audio history and more information is available at widescreenmuseum.com.
A Hitchhiker’s Guide to Audio Formats
Monophonic playback means that all audio content comes from one speaker source. All spatial information is lost. Stereo playback separates audio information into two channels which are routed, respectively, to the left and right corners of the listening space. Electronic mixing of the signal within these two channels allows the creation of a “phantom” center channel, in other words, the sensation of sound emitting from the center, as well as any other point between the two speakers, from left to right. After stereo playback became a norm for long-playing vinyl records, there was a trend to add two more channels to the left and right rear, yielding what is called “quadraphonic. “ There were a limited number of quadraphonic LPs printed and distributed in the 1970s.
Ambisonics: Cracking the Audio ‘Enigma’
In response to the acoustic limitations of quadraphonic playback the British mathematician and acoustic engineer Michael Gerzon began research on what he termed a ‘Harmonic Synthesis’ approach to multichannel audio spatialization. This approach was embodied in an experimental audio format called “Ambisonics.” One of the key features of Ambisonics is that spatial information is not linked to the number of speakers used. This means that the same Ambisonic recording will disclose a small amount of spatial information with a few speakers (such as quadraphonic system) and a great deal of spatial information with a larger number of speakers (for example an array of 20 geometrically arranged speakers). One of Michael Gerzons major contributions to Ambisonics was the invention of the tetrahedral microphone – in essence a 4-in-1 microphone that that is uniformly sensitive in all directions. Click through and read more about Michael Gerzon’s research in spatial audio.
Binaural Audio: Two Ears Are Better Than One
Ambisonics is the system of choice for room audio utilizing speakers, but much, if not most of VR audio production is intended for playback through headphones. The headphone technology for precise spatialization of audio is the binaural format. Binaural technology takes into account the acoustic properties of the human head. We have two ears that take in subtly different acoustic information depending on the direction of the sound source. Sound waves are modified as they travel around the head and the brain’s comparison of the differences in the waveforms that enter the right and left ears results in the perception that sound is traveling from one direction as opposed to another. In scientific terms, this action of the head is called the “head-related transfer function,” or HRTF for short.
Binaural Audio: Smarter Than It Looks
Binaural audio is normally recorded using a “dummy” head, literally, a mannekin reproduction of a human head, with anatomically correct ear canals and microphones positioned where the ear drums would be. This dummy head is then placed in the audience seating of an auditorium and records audio with uncanny spatial realism. HRTF data can also be applied to audio content in the studio, thus giving it spatial credibility (including up and down).
Two VR Audio Formats: Better Late Than Never
Both Ambisonics and binaural have long been beloved of audio specialists but prior to the meteoric arrival of virtual reality, these audio formats had not acquired much traction in the mainstream. All of a sudden, they both are tremendously relevant to VR audio production and have taken a new life. Both Facebook and YouTube have aggressive initiatives to incorporate VR into their respective platforms and both companies are taking the VR audio component very seriously. Facebook now offers a set of free tools for VR audio production called “Spatial Workstation”. Google provides a guide to YouTube VR audio.
The More Things Change the More They Stay the Same
These are exciting developments in the virtual reality audio world. Yet, they merely represent new opportunities for audio content production. They do not replace other mainstream audio formats or production practices, so audio professionals will still mix and master stereo music and 5.1 or higher surround for film and video post-production for many years to come. These two virtual reality audio formats, Ambisonics and binaural, have been around for a long time. It is time for them to take their place in the mainstream. In the end, though, great audio is not about format or spatial richness, but about best audio production practices, and it doubtful that that standard will change.