What Does The Damn Thing Actually Do And How Does This All Work? – Part 1: General Audio Terms

This is part 1 of our series of blog posts with simple explanations for the most common terms in audio, as well as features and functions found on recording studio equipment. The resource to end the confusion and help you focus on creating amazing music!


Fellow audio nerds, please note:

The explanations are as non-technical and simple, as possible. I want it to be practical and useful and I don’t care if everything is 100% scientifically correct. All that matters to me are the results that the people who read this will hopefully get.

It’s not meant to be an academic piece of work on audio technology and if you are an experienced engineer who thinks this is stupid, then I’m really sorry, but this is not for you then. And I can definitely understand your desire for accuracy and your love for nerdy discussions, because I’m a total freak in this regard, myself (just ask my wife). I just think inflating our egos by throwing around complicated terms and definitions is not going to help anyone trying to capture a great song.

General Audio Terms:


Mono means just one channel of audio. If a mono signal is played back through two or more speakers, the same signal will come out of every speaker. and if the speakers are at the same volume, the signal will seem to come out “the middle” between the speakers.

Stereo means two channels of audio, played back at the same time, that have different information in them. One channel is played back by one speaker and the other by another speaker. If you sit between the two speakers, you can hear some parts of the signal coming from the left and some parts coming from the right. Music is generally mixed and played back in stereo, these days. At least most of the music.

Surround means more than two channels of audio. In a 5.1 setup this would mean left, right, center, rear left, rear right and subwoofer. 


When you record the same source with two or more microphones and you notice a weird, “thin” sound when you listen to those mics combined, the signals might not be “in phase” with each other. This happens, because the sound arrives at the various mics at different times. So when you zoom in in your software and look at the waveforms of two tracks and how they overlap, you might notice that one waveform “goes up”, while another one “goes down” at the same time. This leads to some frequencies being cancelled out and others might be over-pronounced, or the signal gets very weak and quiet, if most of it gets cancelled out.

If you take two identical files and invert the polarity on one of them, the result is actually silence. It cancels out completely. So, the more similar the files are, the more relevant this is. If the files have no relationship and are completely different things, it doesn’t matter.

Phase problems can also occur when you have a piece of equipment in the chain that’s inverted. If you solder a cable wrong, the polarity can be inverted and some gear is also soldered “wrong” or inverts the polarity for some other reason. In this case, we are talking about a 180° inverted signal. That’s an easy fix. The polarity switch on your interface, preamp or in your software will help you find the “inverted signal” and get it back in phase with the others. If it’s not a 180° shift, but a subtle shift, due to multiple mics in different positions, you can try inverting the polarity but also move the mics, or align the tracks manually in your software, until they sound right.

A very common example where you almost always have to invert one of the mics, is miking a snare drum with both a top and bottom mic. The mics are aiming at the same source, but they are facing each other. So when you hit the snare, the membranes of the mics move in opposite directions. When the snare head gets pushed down, the upper mic’s membrane gets “sucked out” and the bottom mic’s membrane gets “pushed in”. When the snare head swings back up, the opposite happens. This leaves you with two signals that cancel each other out. The combined sound is lacking “body” and volume. If you invert one of them, it usually sounds fuller and just right.

Always try the polarity or “phase” switch and look at the waveform to try and find the right place in time for each signal by moving it back and forth, when combining multiple microphones on one source and see what sounds best.


The perceived volume. So, not the actual sound pressure that is measured, but the subjective perception of volume. Two signals that result in the same sound pressure numbers or have the same peak dB values on a meter, can have completely different loudness. One can sound louder than the other. This is mainly because our hearing is more sensitive to certain frequencies/sounds than others and because some music is more “dense” and some music is more dynamic with very loud but also very quiet parts. The dense stuff will sound louder, even if the peaks are the same.


(Hard-)Clipping means going above the maximum level that a device or plugin can handle. Imagine the peaks of the signal getting “chopped off” and distortion being introduced. The more you clip a signal, the more audible the distortion becomes. There is “hard clipping”, which means going above “0” in a digital system (like your software) and “soft clipping”, which means driving analog circuits (or emulations of those) slowly into saturation and distortion. When hard clipping, nothing happens until you hit “0”, but as soon as you go over it, it immediately distorts. When soft clipping, the overdrive starts gently and increases as you get closer to the maximum the circuit can handle. Just the way you would drive a guitar amp, for example. Most interfaces or preamps have peak LED meters that warn you when you’re about to clip your input signal. Because if you record a clipped signal, the distortion you add can not be removed later in the process. So in most cases it’s best to avoid overloading your preamps/interface/converters while recording. 

dBFS / RMS / Peak / LUFS / SPL

Those are terms/units that describe different ways of measuring level, loudness or volume.


“Decibels relative to Full Scale”. This scale is used for measuring levels in digital systems (e.g. your software). 0 dBFS is the maximum level. Everything above is “in the red” and therefore clipping (overloading, distorting). The scale refers to the amplitude of a signal compared with the maximum which a device can handle before clipping occurs (“0”). So, for example, -7dBFS means “7dB away from the maximum of 0”.


The maximum level of your audio signal at a specific moment in time.


“Root mean square”. The average level of your audio signal, measured over a longer period of time, and close to what your ears perceive as the loudness of your audio.


“Loudness units relative to Full Scale”. This is a loudness standard designed to enable the matching of perceived audio levels. So that different signals (or songs) will sound equally loud, no matter what the dB meter says. Loudness Units (or LU) is a unit that describes loudness by taking into account how our hearing perceives volume. Not just pure sound pressure or amplitude, like “dB” does. And again, “FS” means “relative to full scale”. So, for example, -18 LUFS means “18 LU away from the maximum of 0”. The difference between -23 LUFS and -18 LUFS, for example is 5 LU.

Dynamic Range

Dynamic range describes the difference between the loudest and quietest parts of a signal or song. The greater this difference is, the more “dynamic” the material. 

Frequency Spectrum

Every sound you hear consists of one or more sine waves (single frequencies, or “notes”). The unit “Hz” (Hertz) describes the frequency of the wave (“speed” or wavelength). The lower the note or tone, the lower the frequency. So, for example, 50Hz is a lower note or tone than 500Hz.

Many frequencies together build the frequency spectrum of a signal, or the spectrum a certain piece of gear is able to process or reproduce. Or the frequency spectrum of human hearing, for example. We can properly hear frequencies from about 20Hz to 20kHz when we are young and the older we get the upper limit gets lower. So depending on your age, you might only be able to hear up to 16kHz or even less. “1kHz” (Kilohertz) means “1000Hz”. So, when you look at a spectrum analyzer in your recording software, it will most likely show you a spectrum from 20Hz (or 10Hz) to 20kHz for that exact reason. This spectrum is relevant to music production, because that’s what we can actually hear. Everything above that is for dogs and bats (ultrasonics) and everything below that (infrasonics) can make you feel sick, if it’s loud enough, or be an indicator for an earthquake, but it won’t help your music much and speakers can’t reproduce it correctly, anyways. 


A transient is the very beginning, the “attack” of a sound. This could be the initial stick attack of a snare drum hit, or the pick attack of a guitar note, or the hard, plosive sounds of a vocal, like “t”, “p”, “k”, etc. If you tame or cut off a transient, the sound becomes “round” and soft. If you increase the volume of the transient, the sound becomes “hard” and “direct”. 


Attack is used to describe the transient of a sound, the “hardness” of a sound.

It can also mean the time a device takes to reach a certain level of processing. Fast Attack: Device reacts quickly. Slow attack: Device takes some time to fully process an incoming signal. 


Sustain describes the “end” or decay of a signal or sound. The ringing of a drum, or a guitar note that is being held and slowly fades away. 

These are the other posts in this series:

  • The Production Process

  • Routing And Processing

  • Microphones And Mic Accessories

  • Preamps, Converters, Interfaces

  • ​Cables and Connectors

  • Computer, Software And Audio Files

Got self-recording friends? Share and help them up their game!