Wednesday, February 29, 2012

Mixology: The Dynamics of Compression

Compression is a tool to narrow the dynamic range of an audio signal, the dynamic range just being the amplitude, or volume of the incoming audio signal. In a nutshell, compression can make loud things quieter, and quiet things louder. A track compressed to the maximum would have a scream and a whisper at the same volume. So how does it work?

Note that we are not talking about compression of an audio file into mp3 format, that is a different topic. 

What is Compression?
When a track gets too loud (a subjective judgement, to be sure) compression turns the volume down. Think back to high school (if out of high school) when you would play loud rock music in your room. Now, your parents were right there in the living room, and you knew there was an F-bomb coming up. So what did you do? You sat there with your hand on the volume controls of your boom box… I mean iPod dock… and turned down the volume just before the F-bomb dropped, then turned it back up again when the offensive line passed. You acted as a compressor.

Threshold
So what determines when the track is too loud? Most compression devices or plug-ins have a threshold parameter. The setting on this is in decibels (dB) and corresponds with the level on the track in your mixing window. If you set the threshold to -12dB, anytime the audio rises above -12dB on the track fader, compression kicks in. The lower the threshold, the more often, and longer, compression is engaged.

Ratio
The compression can squash the life out of your audio, or it can be subtle. This is determined by the ratio. Ratio is measured as just exactly that, a ratio, or a comparison between two numbers. A ratio of 2 to 1 (or 2:1) renders the audio half as loud when the compression is engaged, or above the threshold. A ratio of 10:1 makes it 10% of the original volume. The threshold determines what is compressed, the ratio determines how much it is compressed.

Other Compression Parameters
Other typical parameters you will see on a compressor is attack, decay, make-up gain, and knee. Attack changes how quickly the compressor reacts when the audio signal exceeds the threshold. Imagine compression on a snare drum track. A short or even nonexistent decay (only possible in the digital world with software compressors) will catch the audio right as it exceeds the threshold, while a longer decay will let through the loud attack of the snare drum hit, while kicking in to compress the tailing out of the drum. There are certainly reasons for both decisions when it comes to mixing.

A long decay will cause the compression to engage for longer, only slowly easing up after the volume drops below the threshold. With a short decay, the compression will "let go" right after the audio volume drops back below the threshold.

Make-up gain is what allows compressors to boost the volume. The first stage of compression is lowering the volume of the louder portions, then by raising the make-up gain, the overall output is increased, causing not the loud parts to be quieter, but the quiet parts to be louder.

Compressors with a knee parameter allow the ratio to change based on how high the volume is above the threshold. For example, audio exceeding the threshold by 1 dB may be compressed 1.5:1, while 10dB above may be 5:1. This allows for subtle compression near the threshold, and more drastic compression well above it.


Limiting: Cousin to Compression
A limiter is compression set to a ratio of infinity:1. What this means is that the volume of the audio signal will never exceed the threshold. This is sometimes referred to as a "brick wall", and is often used as a mastering tool. Limiting is simply an extreme version of compression. Many audio interfaces and portable recorders have limiting functionality built-in, as a defense against loud transients ruining a recording take.


Fixed Threshold Compressors
Many compressors use a fixed threshold. These do not allow you to change your threshold level, but instead usually have an input gain control, and an output gain control. Instead of changing the threshold, you boost the input gain to push more of the audio level up above the threshold.

Side Chain Compression
Side chain compression is a neat trick, and a commonly used effect. With side chain compression, you still have a compressor on a track, and it is still compressing that track, but the audio signal that engages the compression comes from somewhere else, via a "side chain". This is used to get tremolo type rhythmic variations from a track.

Sunday, February 19, 2012

Nerd Alert: Bits and Bytes. And Nibbles?

The smallest unit of information in computing systems is a binary digit, or bit. A bit can convey either a 1 or a 0; think of a bit as a switch that can be on or off. A grouping of 8 bits is called a byte. A grouping of 4 bits? A nibble (its true), though this term is uncommon.

Binary Numeric Systems
With enough bits, all numbers can be represented. Here are all the numbers that can be represented with a nibble (or 4 bits):
0000 = 0
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7
1000 = 8
1001 = 9
1010 = 10
1011 = 11
1100 = 12
1101 = 13
1110 = 14
1111 = 15

ASCII
As computers can only understand binary numbers, each character of the alphabet can be represented with a byte of information, using an assigned code. With one byte, there are 256 possible characters. 26 for lower case letters, 26 for uppercase letters, with plenty left over for punctuation and other characters. This is based on the American Standard Code for Information Interchange (ASCII). In binary, the word "binary" would be represented this way:
01100010 01101001 01101110 01100001 01110010 01111001

Computers do a lot of translating behind the scenes that often gets taken for granted. And this is just to render characters, the amount of computation that occurs to edit audio is astounding.

Disk Storage Hierarchy
8 Bits = 1 Byte
1024 Bytes = 1 Kilobyte
1024 Kilobytes = 1 Megabyte
1024 Megabytes = 1 Gigabyte
1024 Gigabytes = 1 Terabyte
1024 Terabytes = 1 Petabyte
1024 Petabytes = 1 Exabyte
1024 Exabytes = 1 Zettabyte
1024 Zettabytes = 1 Yottabyte
1024 Yottabytes = 1 Brontobyte (I'm sure that before long this term will be commonplace)


From web comic XKCD (http://imgs.xkcd.com/comics/1_to_10.png)

Nerd Alert: Sample Rate

Audio can exist in many different states (just like H20 can be vapor, water and ice). The sounds you hear are acoustic energy waves traveling through the atmosphere. When these acoustic waves are captured by a microphone, a process called electromagnetic induction (induction is when energy is transformed from one state into another - like your toaster, electricity to heat), the audio becomes an analog or electrical signal. This analog signal can be captured by an audio interface, a device that gives your computer additional audio inputs and outputs, which converts it into a digital signal through a process called sampling. This brings us to our topic today.

Sampling
An audio sample is a digital representation of one instantaneous moment in an audio recording. Huh? In other words, your audio interface takes a sample of the audio (like you might with the ice cream flavors at Baskin Robbins), which measures the amplitude of the audio signal at that given instant. Still huh? Think of a sample as a look at the incoming audio. Every time you look at the audio you measure the position of the audio waveform. The more looks you take, the better you can determine how the waveform evolves over time, and the more accurately you can capture the audio. The number of "looks" you take per second is called the sample rate.

Sample Rate
Remember those flip books you would draw as a kid on a pad of post-it notes? A stick figure being chased by a dog and jumping over a fence (was that just me?). Each new post-it note, the stick figure would advance one step further, and when you flipped through it, it came to life. This, of course, is how motion pictures work, and each of your drawings would be called a frame. The number of frames per second is a similar concept to the number of samples per second. Frames for our eyes, samples for our ears. Here's an interesting fact though: Your ears need much, much, MUCH more information than your eyes (or perhaps your eyes are more efficient at processing information). When you go to the movies, you are watching film at a rate of 24 frames per second. When you play a CD (remember those?), you are listening at 44,100 samples per second, a rate with 1,837 1/2 times the information. Sample rate is the number of samples per second, measured in hertz (Hz) or kilohertz (kHz). A rate of 44,100 samples per second is more commonly referred to as 44.1 kHz.

Nyquist-Shannon Sampling Theorem
Why 44.1 kHz? Typical human hearing range is from 20 Hz (referring now to frequency or pitch, not sample rate) to 20 kHz. One audio waveform is one push and one pull of energy. An up and a down. Swedish engineer, Harry Nyquist, and American engineer, Claude Shannon, posited that the highest frequency that could be captured digitally is 1/2 of the sample rate (having to do with needing to sample both a push and a pull of a waveform - a very high frequency waveform could squeak through two samples undetected). At a sample rate of 44.1 kHz, frequencies of up to 20.5 kHz (just beyond typical human hearing range) could be captured, thus rendering a lush representation of the audio, with the full spectrum of frequency.

Typical Sample Rates, and Why 192 kHz?
CDs are 44.1kHz. If you do any work with audio for film, your standard is 48 kHz. With computers getting more powerful, and with hard drive space getting cheaper, higher sample rates have become common. You will likely see 88.2 kHz (double 44.1 kHz), and 96 kHz (double 48 kHz) as sample rate options, and even up to 192 kHz (double 96 kHz). The very highest frequencies humans can hear are 25 - 30 kHz, and a sample rate of 192 kHz would render frequencies up to 96 kHz. So why, then, the need for such a high sample rate? There are arguments for and against, with some saying 192 kHz is just hype, and is useless as a sample rate, while others argue that the higher sample rate provides better fidelity after audio manipulation, such as pitch-shifting an audio recording down several octaves. Still others argue for the higher sample rate as a way to "future proof" their recordings. Not knowing what audio file formats will become standard in the future has driven them to record their audio in the best available quality, so that mix downs at a higher sample rate are possible later.

These audio examples are the same piece of music, bounced out at different samples.

Thursday, February 9, 2012

The syNth Degree: LFOs

As mentioned in the envelopes post, in synthesizers, samplers and other virtual instruments, there are many different parameters (like pitch, volume, filter frequency, etc.) that can be dynamically controlled. One way to control these is through modulators. A modulator is something that affects a change in something else (think about modulating from one key to another). LFOs are one form of modulation.


LFO
LFO stands for low frequency oscillator. An oscillator generates a waveform. You may be familiar with the concept of oscillation from your oscillating fans at home (unless you have central air of course). These fans swivel to create a "waveform" of air blowing through the room, and are therefore oscillators. LFOs are not audio generating oscillators, they generate control signals that allow modulation of different parameters. The difference between envelopes and LFOs as modulators, is that an envelope modulates a parameter for the entire duration of a note or phrase, with one attack, one decay, and one release, and an LFO repeats the attack and decay cycle.




LFO components
Shape
LFOs create a waveform shape, determining the character of the modulation. Does the volume rise and fall gently, as with a typical tremolo effect? This is caused by a gently sloping sine or triangle wave. Does the pitch bounce back and forth between two distinct notes? This is caused by a pulse wave LFO affecting pitch. Typical LFO shapes are sine, pulse (or square), triangle, sawtooth (allowing ramp ups or ramp downs), and random sloping or stairstep.






The following are audio clips from a synthesizer with an LFO modulating the volume.
 
sine wave at a rate of 2 Hz (2 cycles per second, or 1/4 note at 120 beats per minute)
 
sawtooth wave at a rate of 2 Hz
 
square or pulse wave at a rate of 2 Hz


random wave at a rate of 8 Hz (8 cycles per second, or 1/16 note at 120 beats per minute)


Rate
The rate determines the frequency of the waveform, or the speed of the modulation. Rate can be measured in hertz (Hz) or cycles per second, or the rate can be synced to tempo and be measured by a notation value (1 bar, 1/4 note, 1/8 triplet, etc.). The faster the tempo in your sequencing software, the faster your 1/4 note LFO will modulate. LFOs measured in Hz can typically go from as slow as .1 (or one LFO cycle over 10 seconds) to 100. Modulation faster than 100 Hz gets into the realm of frequency modulation, or FM synthesis, which we will discuss in another post.



Here is an audio clip from a synthesizer that employs a sine wave LFO modulating the pitch. The LFO's rate is automated from .1 Hz (or 1 cycle per 10 seconds) to 100 Hz, while the LFO's amount is to a constant of multiple octaves.






Amount
Amount determines the strength of the waveform. In audio waveforms, rate determines pitch, and amount determines amplitude or volume. Amount is usually determined by percentage.

Here is an audio clip from a synthesizer that employs a sine wave LFO modulating the pitch. The LFO's amount is automated from 0 cents to multiple octaves over 4 seconds, while the LFO's rate is at a constant 2Hz (or two cycles per second, or at 1/4 note at 120 beats per minute).