Page 47 - FCW, September/October 2019
P. 47

and his personal habits,” Gizmodo reported.
And soon, eavesdroppers will
be able to pick up information just from a person’s vocal character- istics. A team at Carnegie Mellon University, for example, is devel- oping machine learning-fueled voice forensics technology that
can potentially yield a person’s age, height, health status and much more from a voice sample. In fact, we’re already seeing artificial intel- ligence that reconstructs a person’s face based on a single sound bite. These inferences will prove to be invaluable to spies the world over.
Smartphone users might think, “I’ll just tape over the microphones on my smartphone and laptop and that should keep the bad guys from listening.” Facebook CEO Mark Zuckerberg famously had a simi- lar notion. Unfortunately, although providing a physical barrier over
a microphone can muffle sound, it won’t be stopped completely. Plus, if hackers have the tools to remote- ly install spyware on a smartphone or tap into a smart speaker, they likely have access to audio foren- sics technology for deciphering conversations distorted by a piece of tape.
Exacerbating the problem is
the fact that turning off the micro- phones in devices is impossible
or impractical. With today’s popu- lar smartphone models, users can disallow microphone access to any given app, but there’s no way to completely disable the microphones from use, short of removing them from the device. And while some smart speakers — such as the Ama- zon Echo and Google Home — have buttons or switches for shutting
off their microphones, manually switching them back on requires touching the device, defeating the purpose of having voice-activated technology in the first place.
Bring on the noise
Enter audio masking. Similar to
how white-noise machines drown out the audio in a room, on-device audio masking — whether delivered through an anti-surveillance smart- phone case or smart speaker gate- keeper — adds noise at the locations of the microphones. At the proper volume levels, the end result is that the device is essentially deafened, burying any valuable audio content beneath the noise floor of the mask- ing signal and rendering recordings useless to eavesdroppers.
But not any noise will do. To pre- vent a conversation or audio snippet from being successfully deciphered using advanced audio forensics, the noise added to the audio mix must be:
1. Random. The noise should ide- ally be created using a true random number generator based on a source with high entropy. If the noise is repetitive or even pseudo-random, a recorded audio signal can be pro- cessed — using two-channel adaptive filtering — to essentially “subtract” the noise profile found in a reference file from the audio in a target file, deliver- ing a version of the target file with sig- nificantly improved speech intelligibil- ity. To help synchronize the timing of the two files, additional signal process- ing — using fast Fourier transform analysis — can help determine the repetition rate of the noise profile.
2. Microphone-specific. The audio masking must occur indepen- dently for each microphone, whether for the four microphones found in
the iPhone XR or the seven found in the second-generation Amazon Echo. Doing so prevents an eavesdropper from using cross-correlation or known pseudo-random patterns to extract the audio from the masked output.
3. Adaptive. The level of audio masking must adapt to the volume of the audio being masked throughout the range of human speech — from
a whisper to a shout. After all, if the level of audio masking were at maxi- mum volume the entire time, the noise would be unbearable for the user, and if the level of audio masking stayed within a moderate range, it wouldn’t be able to adequately mask louder speech and sounds.
With these three characteristics,
a conversation’s content (the words spoken) and context (accents, tones, number of participants, etc.) will be unidentifiable to an eavesdropper because the final audio output will be indistinguishable from a recording of noise alone.
Of course, having the best audio masking in the world is meaningless if it gets in the user’s way. Even though we may want our microphones to be actively listening to us only a fraction of the time, we don’t want any hiccups when it comes to making phone calls, recording audio messages or using a device’s virtual assistant.
For an audio-masking add-on
for smartphones or other handheld devices, the solution is rather simple because a physical mechanism can give the user control over when mask- ing occurs. But a similar add-on for smart speakers requires some creativ- ity. One novel solution allows the user to temporarily stop audio masking
by speaking a custom wake word to the add-on, which then triggers the speaker by whispering the standard wake word.
Because our smartphones, smart speakers and other smart devices
act as witnesses to our most private conversations and personal behav- iors, limiting the exposure of those details while leveraging the powerful technology at our disposal is a chal- lenging balancing act. On-device audio masking is an exciting development that promises to help us achieve that balance. n
Mike Fong is founder and CEO of Privoro.
September/October 2019 FCW.COM 47






































































   45   46   47   48   49