2012 ~ Byte Beat Blog

Saturday, March 17, 2012

The syNth Degree: What The Hell Is Vocoding?

11:44 AM Modulation, shift, Synthesis, vocoder, vocoding No comments

Every sound has its own sonic fingerprint. Hear a 'middle C' on a violin, and on a trumpet, and while you can tell it is the same pitch, you can also tell the sounds of the different instruments apart. In the same manner, your friend calls you up on the phone, and as you answer they say, "hey, its me", and you know exactly who it is because you recognize their voice (and because they don't make phones without caller ID anymore). Vocoding takes the sonic fingerprint of a sound, and filters another sound through it.

Yes, But What the Hell is Vocoding?

The most common application for vocoding is to make a synthesizer sound like a human voice, though there are other applications that we will explore. This is a vey important distinction, that the synth sounds like the human voice. When you hear a vocoded synth, make no mistake, you are hearing the synth being effected by the human voice, not the other way around. In this application, the synth is the carrier, and the human voice is the modulator.

Carriers and Modulators

In order to have a vocoded signal, there must be two simultaneous signals; a carrier, and a modulator. The carrier is the signal you hear, or in the vocoding application we are discussing, the synthesizer, and the modulator is the signal that effects, or modulates the carrier signal, in this case a human voice.

What Does the Modulator Modulate?

More specifically, the way that vocoding works is that the carrier signal is silent, except at the frequency bands being modulated. Vocoders take the frequency spectrum and break it up into narrow bands (like rack-mounted 31 parametric EQ racks), the narrower the frequency band, the better the resolution of the modulator signal. As the modulator signal is fed through, the amplitude of the signal at various frequencies opens up, or modulates the bands of the vocoder, letting the carrier signal through. The best way to think about the vocoders frequency bands is that they are volume gates. As the carrier signal plays, all frequencies are silent until the modulator signal hits, and opens up specific frequency bands (the very frequencies that give the modulator signal its sonic fingerprint), and carrier signal plays at just those frequencies. Essentially, the modulator signal forces the carrier signal to occupy its timbral confines.

Rhythmic Uses for Vocoding

A vocoder can use a drum groove as the modulator signal, instead of a human voice, to achieve interesting rhythmic pads and lead synths. Try experimenting with expanded frequency bands. Narrower bands from the modulator signal give you greater clarity, but expanded frequency bands may sound more interesting in this context.

Vocoding examples

Formant Shift

Most vocoder plug-ins have some sort of "shift" parameter. This allows you to change the timbre of the modulator signal, thus affecting the sound of the carrier signal. If a modulator signal affects the frequencies around 1kHz, and you shift it up an octave, than it will modulate frequency bands at 2kHz instead. Note that the perceived pitch of the carrier signal does not change, just the timbral quality from the modulator signal.

Shift automation

Mixology: The Dynamics of Compression

5:56 AM compression, fixed threshold, limiting, make up gain, Mixology, ratio, side chain, threshold No comments

Compression is a tool to narrow the dynamic range of an audio signal, the dynamic range just being the amplitude, or volume of the incoming audio signal. In a nutshell, compression can make loud things quieter, and quiet things louder. A track compressed to the maximum would have a scream and a whisper at the same volume. So how does it work?

Note that we are not talking about compression of an audio file into mp3 format, that is a different topic.

What is Compression?

When a track gets too loud (a subjective judgement, to be sure) compression turns the volume down. Think back to high school (if out of high school) when you would play loud rock music in your room. Now, your parents were right there in the living room, and you knew there was an F-bomb coming up. So what did you do? You sat there with your hand on the volume controls of your boom box… I mean iPod dock… and turned down the volume just before the F-bomb dropped, then turned it back up again when the offensive line passed. You acted as a compressor.

Threshold

So what determines when the track is too loud? Most compression devices or plug-ins have a threshold parameter. The setting on this is in decibels (dB) and corresponds with the level on the track in your mixing window. If you set the threshold to -12dB, anytime the audio rises above -12dB on the track fader, compression kicks in. The lower the threshold, the more often, and longer, compression is engaged.

Ratio

The compression can squash the life out of your audio, or it can be subtle. This is determined by the ratio. Ratio is measured as just exactly that, a ratio, or a comparison between two numbers. A ratio of 2 to 1 (or 2:1) renders the audio half as loud when the compression is engaged, or above the threshold. A ratio of 10:1 makes it 10% of the original volume. The threshold determines what is compressed, the ratio determines how much it is compressed.

Other Compression Parameters

Other typical parameters you will see on a compressor is attack, decay, make-up gain, and knee. Attack changes how quickly the compressor reacts when the audio signal exceeds the threshold. Imagine compression on a snare drum track. A short or even nonexistent decay (only possible in the digital world with software compressors) will catch the audio right as it exceeds the threshold, while a longer decay will let through the loud attack of the snare drum hit, while kicking in to compress the tailing out of the drum. There are certainly reasons for both decisions when it comes to mixing.

A long decay will cause the compression to engage for longer, only slowly easing up after the volume drops below the threshold. With a short decay, the compression will "let go" right after the audio volume drops back below the threshold.

Make-up gain is what allows compressors to boost the volume. The first stage of compression is lowering the volume of the louder portions, then by raising the make-up gain, the overall output is increased, causing not the loud parts to be quieter, but the quiet parts to be louder.

Compressors with a knee parameter allow the ratio to change based on how high the volume is above the threshold. For example, audio exceeding the threshold by 1 dB may be compressed 1.5:1, while 10dB above may be 5:1. This allows for subtle compression near the threshold, and more drastic compression well above it.

Limiting: Cousin to Compression
A limiter is compression set to a ratio of infinity:1. What this means is that the volume of the audio signal will never exceed the threshold. This is sometimes referred to as a "brick wall", and is often used as a mastering tool. Limiting is simply an extreme version of compression. Many audio interfaces and portable recorders have limiting functionality built-in, as a defense against loud transients ruining a recording take.

Fixed Threshold Compressors

Many compressors use a fixed threshold. These do not allow you to change your threshold level, but instead usually have an input gain control, and an output gain control. Instead of changing the threshold, you boost the input gain to push more of the audio level up above the threshold.

Side Chain Compression

Side chain compression is a neat trick, and a commonly used effect. With side chain compression, you still have a compressor on a track, and it is still compressing that track, but the audio signal that engages the compression comes from somewhere else, via a "side chain". This is used to get tremolo type rhythmic variations from a track.

Nerd Alert: Bits and Bytes. And Nibbles?

5:09 PM ASCII, Bit, Byte, Nerd Alert No comments

The smallest unit of information in computing systems is a binary digit, or bit. A bit can convey either a 1 or a 0; think of a bit as a switch that can be on or off. A grouping of 8 bits is called a byte. A grouping of 4 bits? A nibble (its true), though this term is uncommon.

Binary Numeric Systems
With enough bits, all numbers can be represented. Here are all the numbers that can be represented with a nibble (or 4 bits):

0000 = 0
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7

1000 = 8
1001 = 9
1010 = 10
1011 = 11
1100 = 12
1101 = 13
1110 = 14
1111 = 15

ASCII
As computers can only understand binary numbers, each character of the alphabet can be represented with a byte of information, using an assigned code. With one byte, there are 256 possible characters. 26 for lower case letters, 26 for uppercase letters, with plenty left over for punctuation and other characters. This is based on the American Standard Code for Information Interchange (ASCII). In binary, the word "binary" would be represented this way:
01100010 01101001 01101110 01100001 01110010 01111001

Computers do a lot of translating behind the scenes that often gets taken for granted. And this is just to render characters, the amount of computation that occurs to edit audio is astounding.

Disk Storage Hierarchy

8 Bits = 1 Byte
1024 Bytes = 1 Kilobyte
1024 Kilobytes = 1 Megabyte
1024 Megabytes = 1 Gigabyte
1024 Gigabytes = 1 Terabyte

1024 Terabytes = 1 Petabyte
1024 Petabytes = 1 Exabyte
1024 Exabytes = 1 Zettabyte
1024 Zettabytes = 1 Yottabyte
1024 Yottabytes = 1 Brontobyte (I'm sure that before long this term will be commonplace)

From web comic XKCD (http://imgs.xkcd.com/comics/1_to_10.png)

Nerd Alert: Sample Rate

10:53 AM 192kz, 44.1kHz, 48kHz, 88.2kHz, 96kHz, audio interface, Nerd Alert, nyquist sampling theorem, Sample Rate No comments

Audio can exist in many different states (just like H20 can be vapor, water and ice). The sounds you hear are acoustic energy waves traveling through the atmosphere. When these acoustic waves are captured by a microphone, a process called electromagnetic induction (induction is when energy is transformed from one state into another - like your toaster, electricity to heat), the audio becomes an analog or electrical signal. This analog signal can be captured by an audio interface, a device that gives your computer additional audio inputs and outputs, which converts it into a digital signal through a process called sampling. This brings us to our topic today.

Sampling

An audio sample is a digital representation of one instantaneous moment in an audio recording. Huh? In other words, your audio interface takes a sample of the audio (like you might with the ice cream flavors at Baskin Robbins), which measures the amplitude of the audio signal at that given instant. Still huh? Think of a sample as a look at the incoming audio. Every time you look at the audio you measure the position of the audio waveform. The more looks you take, the better you can determine how the waveform evolves over time, and the more accurately you can capture the audio. The number of "looks" you take per second is called the sample rate.

Sample Rate

Remember those flip books you would draw as a kid on a pad of post-it notes? A stick figure being chased by a dog and jumping over a fence (was that just me?). Each new post-it note, the stick figure would advance one step further, and when you flipped through it, it came to life. This, of course, is how motion pictures work, and each of your drawings would be called a frame. The number of frames per second is a similar concept to the number of samples per second. Frames for our eyes, samples for our ears. Here's an interesting fact though: Your ears need much, much, MUCH more information than your eyes (or perhaps your eyes are more efficient at processing information). When you go to the movies, you are watching film at a rate of 24 frames per second. When you play a CD (remember those?), you are listening at 44,100 samples per second, a rate with 1,837 1/2 times the information. Sample rate is the number of samples per second, measured in hertz (Hz) or kilohertz (kHz). A rate of 44,100 samples per second is more commonly referred to as 44.1 kHz.

Nyquist-Shannon Sampling Theorem

Why 44.1 kHz? Typical human hearing range is from 20 Hz (referring now to frequency or pitch, not sample rate) to 20 kHz. One audio waveform is one push and one pull of energy. An up and a down. Swedish engineer, Harry Nyquist, and American engineer, Claude Shannon, posited that the highest frequency that could be captured digitally is 1/2 of the sample rate (having to do with needing to sample both a push and a pull of a waveform - a very high frequency waveform could squeak through two samples undetected). At a sample rate of 44.1 kHz, frequencies of up to 20.5 kHz (just beyond typical human hearing range) could be captured, thus rendering a lush representation of the audio, with the full spectrum of frequency.

Typical Sample Rates, and Why 192 kHz?

CDs are 44.1kHz. If you do any work with audio for film, your standard is 48 kHz. With computers getting more powerful, and with hard drive space getting cheaper, higher sample rates have become common. You will likely see 88.2 kHz (double 44.1 kHz), and 96 kHz (double 48 kHz) as sample rate options, and even up to 192 kHz (double 96 kHz). The very highest frequencies humans can hear are 25 - 30 kHz, and a sample rate of 192 kHz would render frequencies up to 96 kHz. So why, then, the need for such a high sample rate? There are arguments for and against, with some saying 192 kHz is just hype, and is useless as a sample rate, while others argue that the higher sample rate provides better fidelity after audio manipulation, such as pitch-shifting an audio recording down several octaves. Still others argue for the higher sample rate as a way to "future proof" their recordings. Not knowing what audio file formats will become standard in the future has driven them to record their audio in the best available quality, so that mix downs at a higher sample rate are possible later.

These audio examples are the same piece of music, bounced out at different samples.

The syNth Degree: LFOs

7:08 AM ADSR, LFO, Modulation, Sampling, Synthesis No comments

As mentioned in the envelopes post, in synthesizers, samplers and other virtual instruments, there are many different parameters (like pitch, volume, filter frequency, etc.) that can be dynamically controlled. One way to control these is through modulators. A modulator is something that affects a change in something else (think about modulating from one key to another). LFOs are one form of modulation.

LFO
LFO stands for low frequency oscillator. An oscillator generates a waveform. You may be familiar with the concept of oscillation from your oscillating fans at home (unless you have central air of course). These fans swivel to create a "waveform" of air blowing through the room, and are therefore oscillators. LFOs are not audio generating oscillators, they generate control signals that allow modulation of different parameters. The difference between envelopes and LFOs as modulators, is that an envelope modulates a parameter for the entire duration of a note or phrase, with one attack, one decay, and one release, and an LFO repeats the attack and decay cycle.

LFO components
Shape
LFOs create a waveform shape, determining the character of the modulation. Does the volume rise and fall gently, as with a typical tremolo effect? This is caused by a gently sloping sine or triangle wave. Does the pitch bounce back and forth between two distinct notes? This is caused by a pulse wave LFO affecting pitch. Typical LFO shapes are sine, pulse (or square), triangle, sawtooth (allowing ramp ups or ramp downs), and random sloping or stairstep.

The following are audio clips from a synthesizer with an LFO modulating the volume.

sine wave at a rate of 2 Hz (2 cycles per second, or 1/4 note at 120 beats per minute)

sawtooth wave at a rate of 2 Hz

square or pulse wave at a rate of 2 Hz

random wave at a rate of 8 Hz (8 cycles per second, or 1/16 note at 120 beats per minute)

Rate
The rate determines the frequency of the waveform, or the speed of the modulation. Rate can be measured in hertz (Hz) or cycles per second, or the rate can be synced to tempo and be measured by a notation value (1 bar, 1/4 note, 1/8 triplet, etc.). The faster the tempo in your sequencing software, the faster your 1/4 note LFO will modulate. LFOs measured in Hz can typically go from as slow as .1 (or one LFO cycle over 10 seconds) to 100. Modulation faster than 100 Hz gets into the realm of frequency modulation, or FM synthesis, which we will discuss in another post.

Here is an audio clip from a synthesizer that employs a sine wave LFO modulating the pitch. The LFO's rate is automated from .1 Hz (or 1 cycle per 10 seconds) to 100 Hz, while the LFO's amount is to a constant of multiple octaves.

Amount
Amount determines the strength of the waveform. In audio waveforms, rate determines pitch, and amount determines amplitude or volume. Amount is usually determined by percentage.

Here is an audio clip from a synthesizer that employs a sine wave LFO modulating the pitch. The LFO's amount is automated from 0 cents to multiple octaves over 4 seconds, while the LFO's rate is at a constant 2Hz (or two cycles per second, or at 1/4 note at 120 beats per minute).

The syNth Degree: Envelopes

11:17 AM ADSR, Envelope, Modulation, Sampling, Synthesis No comments

In synthesizers, samplers and other virtual instruments, there are many different parameters (like pitch, volume, filter frequency, etc.) that can be dynamically controlled. One way to control these is through modulators. A modulator is something that affects a change in something else (think about modulating from one key to another). The first type of modulator that we'll look at are envelopes.

Envelopes

An envelope is something that opens and closes (just like a mailing envelope). Set an envelope to open or close a parameter over time, which allows for subtle (or not subtle) evolution of a sound. There are anywhere from 2 to 5 or more legs of the envelope, detailed below.

ADSR

ADSR stands for Attack, Decay, Sustain, Release, which are the four most common legs of an envelope. Some envelopes will feature others such as delay and hold.

Delay: The amount of time that the envelope delays before kicking in (this is a less common envelope feature)

Attack: The amount of time it takes for an envelope to open

Hold: The amount of time the envelope stays open before beginning to decay (this is a less common envelope feature)

Decay: The amount of time it takes for the envelope to decay from the fully open position to the sustain level

Sustain: The amount that the envelope stays open for the duration of a note or phrase. NOTE: Sustain is the only parameter that does not have a specified length of time. A note can be held for one second or twenty minutes, and sustain determines the envelope level for the duration of a note after delay, attack, hold and decay have run their course.

Release: The amount of time the envelope takes to close again after a note or phrase is released

Amp Envelopes
Take for example an amp envelope - an envelope that controls the overall volume or amplitude of a device. Here is an audio clip from a synthesizer that employs a volume envelope. This envelope has a 2 second attack, one second decay, a sustain level of -12 dB (meaning that the decay will decrease the volume 12 dB from the peak volume of the attack) and a release of 2 seconds.
Using an amp envelope you can simulate how different acoustic instruments behave. Let's explore the naturally occurring volume envelopes for a piano and a clarinet:

Piano

A piano is a percussive instrument, and like all percussion instruments that are struck, they are at full volume immediately as the note is played, and can only decrease in volume thereafter.

Clarinet

While a clarinet can certainly play a note full volume, then decrease, this instrument has a fairly unique ability to begin a note from near silence, have a long slow attack, and a gentle decay and release.

Inverted Envelope

Whether using an envelope to affect volume or any other parameter, their function is to modulate a parameter change over time. Keep in mind that the parameter does not have to go from completely closed to completely full. An envelope can change a parameter between any two set locations. Perhaps you want a filter frequency to began mostly open, then close partway, then open back up again. This can be done with an inverted envelope, one that begins open, uses the attack to close, then uses the decay and release to open back up again. Here is an audio clip from a synthesizer that employs an inverted filter envelope. This envelope has a 2 second attack, one second decay, a sustain level of 50% (meaning that the filter frequency will return halfway after the decay) and a release of 1.5 seconds. Note that the amp envelope has a 2 second release. Without this, the filter envelope release would not be apparent.

Byte Beat Blog

Popular Posts

Blog Archive

Labels

Blog Archive

Saturday, March 17, 2012

The syNth Degree: What The Hell Is Vocoding?