As far as I know, the first attempts to capture images using substances that are visibly altered by exposure to light took place sometime around 1725 (give or take a decade). After much experimentation into different approaches, complete instructions for a technique known as the daguerreotype process were made public in 1839, and this remained the most common commercial method until it was superseded by the collodion process in the late 1850s.
Shortly after photographs became available to the general public, people started to utter the phrase “The camera never lies.” The reason they said this was based on the belief that — unlike an artist who adds interpretation to a painting or sketch — a photograph directly captures a scene. Of course, it wasn’t long before people started to realize the ghastly truth, which may be summarized as, “The camera never lies, but photographers do!”
In 1987, the Knoll brothers, Thomas and John, developed Photoshop, the distribution license to which they sole to Adobe Systems in 1988. By the 1990s, the term “photoshopped” had entered the vernacular to refer to an image that had been edited, manipulated, and altered by digital means. There’s a classic picture roaming around the internet showing Abraham Lincoln taking a selfie with his iPhone. This image is typically accompanied by a caption saying something like, “This has to be genuine because Photoshop wasn’t available back then” (it’s hard to argue with logic like that).
A couple of years ago, I ran across an application called Lyrebird AI (Lyrebird is now an AI division within Descript). As I wrote in my column — Thinking of Using Voice Authentication? Think Again! — Lyrebird can listen to someone talking (or a recording of someone talking) and extract a “digital vocal signature.” Later, using this digital vocal signature as part of a text-to-speech app, Lyrebird can generate a speech (or conversation) using the desired voice (or voices).
More recently, we’ve started to see the development of deepfake videos, where the term “deepfake” is a portmanteau of “deep learning” and “fake.” Deepfakes leverage powerful techniques from machine learning (ML) and artificial intelligence (AI) to manipulate or generate visual and audio content with a high potential to deceive (see also What the FAQ are AI, ANNs, ML, DL, and DNNs?).
One way in which this can work is for an AI to analyze a real video of someone making a speech — say a politician — while listening to the associated audio. The AI can observe and register all of the micro muscle movements associated with each phoneme. Subsequently, when presented with a new audio stream — say one generated using something like Lyrebird — the AI can create a new deepfake video to accompany the audio, including eyeblinks, mannerisms, nervous twitches, etc.
When Neil Armstrong, Buzz Aldrin, and Michael Collins set out for the Moon in Apollo 11 on July 16, 1969, they knew that they were on an incredibly risky mission. So did the folks at NASA, which is why they suggested to the White House that President Nixon had an In Event of Moon Disaster speech ready, just in case.
Thankfully, the mission was an outstanding success. To the best of anyone’s knowledge, Nixon never even read the speech aloud … or did he? If you visit the MoonDisaster.org website, in addition to a lot of other material, you can see this video, which includes a deepfake portion showing Nixon giving the dreaded oration.
On the one hand, I applaud the ingenuity of the AI scientists and practitioners who can come up with ways to do this sort of thing. And everything would be hunky-dory if people employed this technology only for fun and entertainment. Sadly, that’s not who we are and what we do.
There’s an old joke that goes, “How can you tell if a politician is lying to you?” The answer is “His lips are moving.” Of course, it was always thus, but things have grown more extreme as the years have gone by. We have now arrived at a stage where politicians have eroded our trust in standard news channels to the extent that no one knows who they can believe anymore.
We have already reached the point when a politician can say something that is demonstrably false, and when he or she is later called out for saying it, they claim “fake news.” We currently live in a reality in which a politician can give a speech that essentially consists of nothing but a stream of bald-faced lies, which are accepted by devotees and derided by the opposition. As bad as this may seem, it won’t be long before we have to contend with the fact that such a speech might be a deepfake.
As reported by the Washington Post in 2019, someone distorted a video of House Speaker Nancy Pelosi, selectively slowing it down to make it sound as if she were drunkenly slurring her words. In this case, the fake was easy to refute — all that was required was to compare it to the original, unaltered video — but what if this this been a deepfake?
Of course, as opposed to being used to tarnish someone’s reputation, this technology could also work the other way around. By means of deepfake videos, a politician with only a 5th-grade education, a limited vocabulary, a mercurial temper, and early onset dementia could be presented to the public as someone who is very highly educated, who knows the best words, and who is a very stable genius, for example.
Recently, I read the cyberpunk transhumanist comic book series Transmetropolitan by Warren Ellis and Darick Robertson. Gathered into 11 volumes, these tomes tell the tale of Spider Jerusalem, who is a smart-mouthed, heavily-armed, cigar-smoking gonzo reporter of the future. The overall arc of the story involves Spider bringing down a corrupt government lead by a psychopathic president. Suffice it to say that deepfake videos are the least of Spider’s problems.
Thank goodness Transmetropolitan is only fiction and we don’t have to contend with anything like this in the real world (said Max, sadly, trying to present a brave front).