You may have viewed an episode of CSI or a similar crime drama program recently. In this episode, a technician sits at a high-tech computer and manipulates what appears to be seriously complicated software. In a few moments, following a series of mouse clicks, the technician generates crystal clear conversation out of what was originally a lot of noise and muffled sounds.
Although this may look convincing, the reality of it is far beyond what audio forensics is all about or what is possible to do.
What can audio forensics do?
Well, it typically boils down to one of the following:
Audio Enhancement
This is where a client would request a complete recording or parts of one enhanced, or in other words, made more audible. Sometimes it is dialogue and other times it is the background ambiance that is of importance to a client seeking audio enhancement.
Audio Authentication
This is where a recording has to be analyzed to determine the integrity and authenticity of the recording. It is being checked to see if the recording was tampered with in any way. This process would require an audio engineer to analyze the waveform, spectral properties of the audio, and the metadata of the actual file. The more information is available, the better as this increases the possibility of the audio being used as evidence.
Voice Comparison
This is easily the most complex component of audio forensics. It involves comparing the speaking style, voice, and speech characteristics of someone between two or more recordings.
Although these are the three main categories audio forensics falls under, there are times when a client requires a little bit of work done on a recording that requires some enhancement, some authentication, and some voice comparison.
Now that audio forensics has been explained, let’s dive into what the common expectations of each component typically generate.
No Guarantees
Before we go any further, there is a disclaimer. Working with audio does not offer a 100 percent guarantee that the desired result will be reached. While audio forensics can play a significant role in providing key evidence in a criminal legal case, on its own it rarely carries any weight in court.
Looking at audio enhancement, for example, where you have an audio recording with dialogue that is either tamed or partially hidden by noise or a distraction, like music or the sound of a fan, the dialogue can be isolated. That is common however, there are also times when nothing can be done to enhance the dialogue.
For example a recording with dialogue recorded near the noise floor of -90dBfs (which is as faint as possible) and music playing at -2dBfs (as loud as a standard radio transmission) where the speaker is located far from the recording device causing the loss of most of the high-frequency content, nothing can be done to bring the dialogue up to be audible.
What is Possible?
Although it may appear to be a simple job as someone mumbling in the background can be heard without effort. Sure, sometimes the stars fall into alignment and the fancy software pays for itself by magically pulling something out, but the reality is that it would take a huge amount of work to make this happen regardless of the star alignment. Audio forensic engineers know the limits and sometimes when something falls outside of what is possible, it is best to leave it at that.
Sometimes an audio file is delivered for enhancement and all it contains is white noise, or there is nothing clearly audible although the client insists they can hear a conversation on the recording in question. The thing is, if nothing was recorded, there is nothing to enhance.
It is vitally important that the limitations of audio enhancement are not only understood by the forensic audio analyst, but also by judges, juries, clients, and everyone else so that the following message is reinforced. Much of what you see on TV or in movies about audio enhancement is not possible in the real world.
Keeping Things on Record
Now, your next question may be, isn’t enhancement a form of tampering with the original sound file? Well, yes and no. It is editing audio, but this is why every single case – no matter what it is – is either enhancing, authenticating, or comparing. Every single step of the process is documented with screenshots and a detailed explanation of what was done and when.
There are a few more considerations with audio authentication although it is essentially analysis. Because there are several aspects to consider, such as waveform, spectral data, and metadata, if all three can be analyzed, the more precise the report will be. A more robust report provides more weight in court. However, if all of the metadata is gone and the recording is nothing more than noise, it will be difficult to say anything about the recording other than how loud it is.
Voice comparison is the most argued about in the audio industry. For example, there was a scandal involving an alleged recording of former City of Toronto Mayor Robert Ford where several audio experts provided different opinions and analytical results after evaluating the recording. One claimed 85 percent surety that the recording was of the former Mayor while another audio expert said it was not.
As I stated before, there are many considerations to factor into the comparison of speech voice characteristics, tone/pitch, intonation of the spoken words, pauses between the words and sentences, and so much more. To have a relatively clean comparison of all of these aspects, you would require two recordings that were recorded on the same kind of device, and the person speaking would have to be at the same distance/perspective from the recording device.
For example, if you are comparing a studio recording of someone speaking to a recording of someone talking in a bar filled with other people and background sounds, you are now comparing apples to oranges.
The poor science showcased in crime dramas on TV and the big screen creates unrealistic expectations when creating forensic reports. At the push of a button, two audio recordings are compared and a TV computer will announce that there is a “100 percent confirmed match” or something like that. In real life, a 100 percent match is something that is not scientifically possible.
Getting the audio right at the source is key. Having realistic expectations of what can be done is another.