- Published on
Deepfake Detection Game
- Authors
- Name
- Sean McGregor
- @seanmcgregor
Video has long been a trusted source of ground truth for world events. People know images can be manipulated, but video manipulation is rare. This has changed with the advent of manipulating videos with neural networks. Automatic video manipulation tools, like those that swapped Nicolas Cage's face onto Amy Adams, are now fast, realistic, and cheap. Defending the public record requires methods for detecting these fakes. But is the deepfake detection game winnable?
Note: While I serve on the steering committee for the Deepfake Detection Challenge, this blog post represents my own views and not that of Facebook, the Partnership on AI steering committee, the XPRIZE Foundation, or Syntiant. I do not provide details on the Deepfake Detection Challenge that are not already publicly available.
What is a Deepfake
Deepfakes are a portmanteau of "Deep Learning" (i.e., the computing method creating the fake) and "Fake". They have been used successfully in manipulating faces, modifying body movements, and more. Most troubling is when they are used to defraud and victimize. At present the purveyors of deepfakes are primarily researchers solving interesting technical problems, but the techniques are quickly seeing adoption among propagandists and trolls. Below are three deepfake demonstrations that serve to illustrate the importance and impact of this problem area.
Overview of the Problem
Deepfake Detection Challenge steering committee member Claire Wardle gives an excellent introduction to the problem in general.
President Obama the puppet
In this video President Obama is made to discuss deepfakes in a very uncharacteristic manner. It is a very high quality fake of considerable length voice acted by a noted Obama impersonator.
Dance puppets
In this research demonstration video, you can see the dance moves of a source getting mapped to a subject. The subject is tasked with dancing as closely as possible to the source, but their movements are far from perfect. You can see a pained look on their faces when they are made to dance like a professional ballerina.
It is now possible for individuals to programmatically create these videos in the hundreds of thousands and web platforms like Facebook, YouTube, and Twitter are concerned they will be used to harm individuals by placing them into compromising situations, or as a vector for state-sponsored propaganda. Since deepfakes are increasingly appearing in the wild with real-world politicians as their subjects, it is not difficult to imagine a nightmare scenario where deepfake goes viral immediately before polling. Consequently, Facebook is putting immense resources behind the Deepfake Detection Challenge to subsidize deepfake detection research and tool development. We need these tools now, but the value of these tools in the long run is doubtful. To explain, I am going to explain how to think in a formal way about deepfake security.
The Deepfake Security Game
People working in cryptography and computer security think in terms of formal games in which an "adversary" actively attempts to bypass a "defender." In the case of deepfakes, there are a variety of individuals and organizations that can play the deepfake game.
Possible Adversaries:
These could include anyone with an interest in generating a deepfake, more specifically, they include the following private-model adversaries
- state propagandists
- stock short sellers
In computer security, state actors are considered the most challenging adversary because they have an extensive catalog of system exploits making it possible to break into almost every connected system in the world. Where state-sponsored hacking has a collection of secret vulnerabilities, well-resourced deepfake adversaries have a set of deepfake generating models not known by the defender.
Additionally, there are public-model adversaries that would make use of publicly available models,
- individual propagandists
- political operatives
- trolls
- extortionists
- pornographers
The distinction between the two adversary types is profound in its implications for platforms, service providers, and news organizations. I step through these defender types below.
Defenders (platforms):
Platforms are purveyors of user submitted content. Some include,
- Facebook
- YouTube
- Twitter
These platforms receive a very large number of videos from users against which deepfakes will always be rare. Any significant misclassification rate for non-deepfake content would produce millions of videos for moderation. They want detector models that have the lowest false positive rate possible (i.e., high precision), subject to detecting some proportion of the deepfakes in the population (i.e., moderate recall).
Defenders (detectors as a service):
These organizations may provide deepfake detection services to third parties by exposing an API that checks submitted content. Some potential deepfake detection providers include,
- Startups
- Google (labeling search results)
- Amazon (detectors as a service)
- Microsoft (enterprise detectors as a service)
The commercial viability of this sort of business model likely hinges on properly scoping the types of adversaries addressed. As I'll explain below, these services cannot adequately address private-model adversaries.
Defenders (news organizations):
These organizations are concerned with establishing the public record via verified unmodified media. They are likely the customers of the "detectors as a service." Examples include,
- Associated Press, NY Times, Washington Post, etc.
- BBC, CBC, PBS, CNN, Fox News, etc.
I am skeptical these organizations should trust the output of any deepfake detector. I will go into why in the following sections.
Defenders (individuals):
This includes everyone that may be exposed to the deepfakes in the wild.
- All individual people
The adversaries are responsible for selecting the subject and method of attack, and the defender is responsible for selecting the detector models that will be monitoring content for deepfakes. When fully defining the game, we need to make assumptions of how the game is carried out,
Adversaries choose:
- Public/private deepfake generation model: "models" produce the deepfakes in coordination with a person that selects the modifications made by the models. If the defender has access to the model then it is possible to use the model in detection. Since training the model is (currently) prohibitively expensive, in practice deepfake models are likely to be used by multiple adversaries. Therefore, state actors and others with strong financial incentives are the only ones likely to have private deepfake models, at least initially. Most adversaries will likely use previously trained models that are functionally public to the defender.
- public: , , ,
- private:,
- Public/private source material: Deepfakes typically modify an existing video to have different contents. The materials used in the construction of the deepfake will often be publicly available and could be used by detector models. The completely open space of source videos from which adversaries operate likely means that assuming all source materials are private is a reasonable starting point.
- public: None
- private:, , , , ,
Defenders choose:
- Detection model exposure: the defender produces a model to detect the deepfake, which can then be exposed to the adversary to varying degrees. Exposure is in terms of "queries" to the model. Generally each query reduces the strength of the model since the adversary can repeatedly modify a video until it passes through the detector.
- no queries by adversary: , ,
While the platform companies are still deciding how to roll out deepfake detection methods, it is in their interest to make the deepfake detection models completely unavailable to both their users and the adversaries hidden among them. They can highlight content flagged for being a potential deepfake and use internal review processes to determine whether the content can be spread virally on the platform. They cannot give the user instant feedback into whether the content has been highlighted as a potential deepfake. - few queries: , ,
Companies offering API access to a deepfake detection model lose the ability to guarantee the secrecy of detector results. As a result, they cannot be as robust to probing by the adversary. - unlimited queries:
If the model can be directly queried by users, then adversaries can query it an unlimited number of times. Here the public interest is best served by not giving the users access to the detector model, but instead give them access to explanations of why a particular piece of content has been identified as being a deepfake. These "explainers" are premised on content being positively identified as a deepfake and can be constructed either by the moderators flagging the content, or by a model tasked with explaining the output of the detector model. - public model:
There is no advantage to the defenders in making the models themselves public, except to facilitate research into improving detection methods.
- no queries by adversary: , ,
- Misclassification detection costs: Detection models can typically be tuned according to a confidence parameter. In settings where the cost of false detection is high (e.g., when multiplied across the millions of videos being uploaded on platforms), then the confidence parameter is required to be high. In cases where the cost of asserting the legitimacy of a video is high (e.g., for a news organization), then the defender can choose to set the required confidence parameter at near certainty.
- High cost for false negative: ,
- Low cost for misclassification: , , , , , ,
- Training Set Exposure: The defenders can support deepfake detector efforts by providing deepfake training and evaluation data to the research community. Google has already done so by publishing a dataset face swapping people in a set of videos recorded by the company. In the Deepfake Detection Challenge, training data was generated under similar circumstances and is provided to competitors. While Facebook's private user videos can not be provided to researchers, the Deepfake Detection Challenge videos are available to the public. Private-model adversaries have the resources to acquire the dataset if it is shared with multiple research teams so the data cannot be controlled through legal-system measures.
- Public Training Data: , , , , ,
- Restricted Training Data: None
- Federated Training: ,
Federated training allows multiple organizations with private data to support training a model on the data while not providing that data externally. These organizations should develop collaborations leveraging federated training for their private data, and otherwise provide their data publicly. - Private Training Data:
While it is in the collective interest of the defenders to cooperate with each other in sharing training data, it is an instance of a collective action problem. Some organizations will not share data so they can gain competitive advantage in deepfake detection. It is of critical importance to both the research community and society more broadly that we avoid a competitive mentality in deepfake detection resources. The only potential persistent advantage held by the defender is the capacity to coordinate.
While a variety of choices can be made at the behest of different defenders, the research community should unify around the following assumptions as a starting point,
- Private deepfake generator model: This solves a more general problem.
- Private source videos: Fingerprinting source material is a distinct approach.
- No query detector model: Otherwise the defender will never win.
- Low cost false detection: The economic costs of manual moderation are socially less important than false negatives.
- Mixed public data and federated training: We can develop methods on the public data, then improve the models through federation.
My reason for these selections are entirely practical and centered on steering the research community towards detection models that will yield results.
Round 1: The Deepfake Detection Challenge
The current moment is the only one in which the defender has an advantage in producing useful detector models. Very few deepfake generators are available to attackers and the generators are not yet actively working to circumvent detectors.
The most common form of deepfake is the faceswap, which also serves as the core dataset for the Deepfake Detection Challenge. Many face swaps have human-recognizable artifacts of modification. It is likely that these artifacts are trivially detectable through a concerted development effort using currently available techniques. The challenging part will come in ensuring the model is robust to even trivial obfuscation (e.g., reducing the resolution of the video) not found in the training set. Deep models are notoriously brittle, so we will quickly move from the defender's advantage to the adversary's advantage in the second round.
Round 2: Advantage to the Adversary
One of the challenges in detecting deepfakes is that deepfake generating models learn how to make deepfakes by attempting to fool detectors. When a detector identifies a deepfake, the deepfake subtly changes the way it generated the deepfake to make it more difficult to detect. In deepfake terminology, the deepfake generator must fool a "discriminator model" that is trained iteratively as the generator improves its outputs. This means the generator is not done learning until the discriminator can no longer identify the fakes reliably...the generator (adversary) is ready when it defeats all known discriminators (defenders). The only hope of keeping detector models robust is to prohibit the generator from having access to both the model architecture and the trained model produced by data. With access to the model architecture, it is possible for the generator to place the untrained detector into its training loop and defeat it.
While this is already bad, the reality is that machine learning methods for adversarial examples are in their infancy. With billions in research and development effort ongoing, no one has adequately solved the fundamental problems of adversarial machine learning in theory, or in practice. Increasingly we see attacking machine learning is easier than defending, and even slight modifications to physical objects in the real world can can completely fool machine learning algorithms that make safety critical decisions in self-driving cars.
This has profound consequences for the choices we make as defenders,
- The model architectures cannot be public if we want to stop private-model adversaries. The detectors will be too brittle.
- We cannot trust the detectors as certifications of content either being true or faked. The data can be subtly manipulated to give either answer.
- The most desirable property for any model is robustness to the adversary. Methods of evaluating detectors must partition generating methods between the training and test set before the method can be assumed capable of solving a private-model deepfake.
- It may be impossible to adequately address private-model deepfake detection (time will tell), but there is still value in solving the public-model case. This turns the game into one where the adversary repeatedly publishes a model and the research community must find a way to characterize its outputs. This is the way the anti-virus community currently works for characterizing viruses in virus scans.
How do we defeat adversaries after they begin actively circumventing the detector? There are a few potential solutions for making the detector models more robust.
- Ensemble models: Typically in high-stakes machine learning competitions you will see competitors join forces after falling behind. The result can be that the 4th and 5th place teams become the 1st place team because they stitch their models together without making any changes to the models themselves. This is like running a relay race rather than running faster. Neither competitor is making a better model. However, in the adversarial setting there is an advantage that running multiple detector models derived from different modeling techniques requires the adversary to fool multiple uncorrelated detectors. This can substantially increase the iteration cost of the adversary. Diversity of modeling approaches is highly valuable.
- Defensive distillation: One of the ways detectors can be fooled is if they have too complex of a model. Increasing the "capacity" of the model (i.e., through more parameters) seems to increase the attack surface. One approach to mitigating this problem is to distill more complex models to simpler ones. If this approach were combined with ensemble modeling, it may substantially increase model robustness. Perhaps a valuable standard practice is to distill all strong learners to many weak learners.
- Modeling people rather than artifacts: Most people are unlikely to be targeted by malicious deepfakes, but the people most likely to be deepfaked are also those whose deepfakes can damage society. It may be possible to model deepfake detection for a fixed set of people (e.g., politicians) more effectively than it is to model generalized deepfake detection. This cedes ground in the game, but forces the adversary to also generate subject-specific models.
- Detecting Obfuscation: Platforms could potentially block all content from viral re-sharing if it has characteristics consistent with deepfake obfuscation. This mitigates impact while not being able to make a final determination on the content's provenance.
Round 3: Goodbye Modeling Solutions
In the fullness of time, deepfake generators will produce fictions that are bit-exact with potential captures of events in the real world. In short, the adversary will win and we are only buying time. At the completion of the Deepfake Detection Game, the only solution is to change the goal. Solutions to the social ills produced by deepfakes are still possible, including
- Content signing: Major publishers are already working to support provenance proofs for their content re-shared online by providing for reverse-lookup of their content. The practice of content registration should be extended to the individual level with modern cryptographic methods.
- Content generation date registration: Often if we know when content was created, then we know quite a lot about its provenance. One promising solution is to simply tag all videos with cryptographically-robust dates (ones that cannot be faked backwards). Reliable dating would allow people, and hopefully also algorithms, to contextualize the content and make content-specific determinations of veracity.
- Reverse content lookup: Several services now exist for finding the origins of content. If we have an easy means of seeing how content spreads over the internet, we can begin to assess the age and provenance of the video.
Closing Thoughts
Deepfake detection is the bridge to a future where we have additional technological and societal means of separating truth and fiction. We need to work collaboratively across organizational, societal, and disciplinary lines to chart our collective course through this change in the way we arrive at truth. I am excited at the prospect of working with Steering Committee colleagues at news organizations, civil society organizations, and big tech companies on this problem area. Let's defend the public record together!
More Reading
I am bringing together an accessible explanation for the dismal state of adversarial machine learning. For a starting point, I recommend watching the following video wherein a vision classifier is tricked into recognizing a 3d printed turtle as a rifle because a specially designed texture was applied to its shell.
More readings and background:
- Dataset Poisoning: The adversary can potentially extend the life of their models by seeding the world with deepfake artifacts on content they don't substantially alter. This takes away the modeler's ability to work with the signatures of manipulation, and thus makes the task impossible. The only solution is to ensure that these false-deepfakes are never treated as negative training data for the model.
- "Will my Machine Learning be Attacked?" A barebones intro to thinking about threat modeling in ML.