As far as I know, the first attempts to capture images using substances that are visibly altered by exposure to light took place sometime around 1725 (give or take a decade). After much experimentation into different approaches, complete instructions for a technique known as the daguerreotype process were made public in 1839, and this remained the most common commercial method until it was superseded by the collodion process in the late 1850s.
Shortly after photographs became available to the general public, people started to utter the phrase “The camera never lies.” The reason they said this was based on the belief that — unlike an artist who adds interpretation to a painting or sketch — a photograph directly captures a scene. Of course, it wasn’t long before people started to realize the ghastly truth, which may be summarized as, “The camera never lies, but photographers do!”
In 1987, the Knoll brothers, Thomas and John, developed Photoshop, the distribution license to which they sole to Adobe Systems in 1988. By the 1990s, the term “photoshopped” had entered the vernacular to refer to an image that had been edited, manipulated, and altered by digital means. There’s a classic picture roaming around the internet showing Abraham Lincoln taking a selfie with his iPhone. This image is typically accompanied by a caption saying something like, “This has to be genuine because Photoshop wasn’t available back then” (it’s hard to argue with logic like that).
A couple of years ago, I ran across an application called Lyrebird AI (Lyrebird is now an AI division within Descript). As I wrote in my column — Thinking of Using Voice Authentication? Think Again! — Lyrebird can listen to someone talking (or a recording of someone talking) and extract a “digital vocal signature.” Later, using this digital vocal signature as part of a text-to-speech app, Lyrebird can generate a speech (or conversation) using the desired voice (or voices).
More recently, we’ve started to see the development of deepfake videos, where the term “deepfake” is a portmanteau of “deep learning” and “fake.” Deepfakes leverage powerful techniques from machine learning (ML) and artificial intelligence (AI) to manipulate or generate visual and audio content with a high potential to deceive (see also What the FAQ are AI, ANNs, ML, DL, and DNNs?).
One way in which this can work is for an AI to analyze a real video of someone making a speech — say a politician — while listening to the associated audio. The AI can observe and register all of the micro muscle movements associated with each phoneme. Subsequently, when presented with a new audio stream — say one generated using something like Lyrebird — the AI can create a new deepfake video to accompany the audio, including eyeblinks, mannerisms, nervous twitches, etc.
When Neil Armstrong, Buzz Aldrin, and Michael Collins set out for the Moon in Apollo 11 on July 16, 1969, they knew that they were on an incredibly risky mission. So did the folks at NASA, which is why they suggested to the White House that President Nixon had an In Event of Moon Disaster speech ready, just in case.
Thankfully, the mission was an outstanding success. To the best of anyone’s knowledge, Nixon never even read the speech aloud … or did he? If you visit the MoonDisaster.org website, in addition to a lot of other material, you can see this video, which includes a deepfake portion showing Nixon giving the dreaded oration.
On the one hand, I applaud the ingenuity of the AI scientists and practitioners who can come up with ways to do this sort of thing. And everything would be hunky-dory if people employed this technology only for fun and entertainment. Sadly, that’s not who we are and what we do.
There’s an old joke that goes, “How can you tell if a politician is lying to you?” The answer is “His lips are moving.” Of course, it was always thus, but things have grown more extreme as the years have gone by. We have now arrived at a stage where politicians have eroded our trust in standard news channels to the extent that no one knows who they can believe anymore.
We have already reached the point when a politician can say something that is demonstrably false, and when he or she is later called out for saying it, they claim “fake news.” We currently live in a reality in which a politician can give a speech that essentially consists of nothing but a stream of bald-faced lies, which are accepted by devotees and derided by the opposition. As bad as this may seem, it won’t be long before we have to contend with the fact that such a speech might be a deepfake.
As reported by the Washington Post in 2019, someone distorted a video of House Speaker Nancy Pelosi, selectively slowing it down to make it sound as if she were drunkenly slurring her words. In this case, the fake was easy to refute — all that was required was to compare it to the original, unaltered video — but what if this this been a deepfake?
Of course, as opposed to being used to tarnish someone’s reputation, this technology could also work the other way around. By means of deepfake videos, a politician with only a 5th-grade education, a limited vocabulary, a mercurial temper, and early onset dementia could be presented to the public as someone who is very highly educated, who knows the best words, and who is a very stable genius, for example.
Recently, I read the cyberpunk transhumanist comic book series Transmetropolitan by Warren Ellis and Darick Robertson. Gathered into 11 volumes, these tomes tell the tale of Spider Jerusalem, who is a smart-mouthed, heavily-armed, cigar-smoking gonzo reporter of the future. The overall arc of the story involves Spider bringing down a corrupt government lead by a psychopathic president. Suffice it to say that deepfake videos are the least of Spider’s problems.
Thank goodness Transmetropolitan is only fiction and we don’t have to contend with anything like this in the real world (said Max, sadly, trying to present a brave front).
A very interesting piece. It is worrying how far video faking technology has come and how hard it is to prove something is fake.
(But maybe my comment has been deep faked!)
Have you heard about Anti AI AI (http://antiaiai.info/) this is something that detects when an AI voice is talking to you — presumably the next step will be Anti Deepfake AI
Based on reading quite a few science fiction stories, I can actually see a use for deepfake technology. In conjunction with an AI, it could act as your personal assistant — holding a conversation for you, say to make an appointment at the dentist.
That is pure genius! When will you have it ready?
My wife is a teacher. When she needs to see the doctor, she phones, gets an answering machine and leaves a message. They phone back- she is in the classroom. They give her a time. She phones back and leaves a message that the time is unsuitable. And so the telephone tag goes. And all through this, somehow it is my fault.
‘Please have it ready soon….
“And all through this, somehow it is my fault.” It’s funny you should say this — it’s the same way in my house LOL
Since we are all just bits of code in a Matrix like simulation, our opinions are largely moot. In the spirit of playing along, if I can imagine, it will all boil down to math. Specifically, probabilities and confidence levels. What is the confidence level of that audio or video being real vs. fake? Since most audio/video these days is digital, it won’t be long before the samples/pixels are modified between the sensor and the storage medium. So, what is originally stored won’t even be able to be trusted. I guess that would be a RealFake? I have high confidence that my simulated brain now hurts. Maybe.
Everything would be great if people weren’t such plonkers — if this technology was used only for the good — like entertainment — life would be sweet — but then you look at the way the world is today and you realize that there are slimeballs out there who will corrupt everything they touch (sad face).
I hear you Max. There are times I feel truly sorry for the person who invented the #2 pencil. Innumerable great and magnificent things came from it. But then some cretin uses one to write <fill in your choice of evil authorship>. “Plonkers” is too nice by far.
You never hear about the chap who invented the #1 pencil 🙂