Published on

Dr. Jekyll and Mr. Paperclip

Authors

In the "Strange Case of Dr. Jekyll and Mr. Hyde," a London-based doctor in 1886 indulges in his worst instincts while transformed into the murderous Mr. Hyde. The classic work of gothic horror has much to teach us when deciding whether advances in artificial intelligence (AI) are likely to be good (Dr. Jekyll) or bad (Mr. Hyde). Both the literary and technological references present the same answer: Jekyll and Hyde are one and the same.

All intelligent systems are inherently Jekyll and Hyde. Why? There is no separating "good" power from "bad" power. There is only "power." For computers, how that power is directed requires less than the serum concocted by Dr. Jekyll to affect its moral character. Let's look at one of many examples of how easily systems of great power tilt between good and evil.

A recently transformed Mr. Hyde with his hand placed upon a book -- ready to pursue his worst instincts. These images are all from the 1920 silent film adaptation of the book.

The Transformative Power of Cosmic Rays

Intelligent systems are produced by maximizing or minimizing a numeric objective. Most people would argue the objective to maximize "human wellbeing" is better than minimizing it, but within computer systems these two objectives are close neighbors. Let's count in the language of computers:

000: zero
001: 1
010: 2
011: 3
100: -4
101: -3
110: -2
111: -1
000: zero

Every number with a 1 in the leftmost digit will be minimized, 000 will never influence the objective, and all other numbers will lead to maximizing. What this means is that all negative numbers are a single digit away from a positive number. For a computer system to be "safe," it must never accidentally change the leftmost number because it will immediately transform to the "evil form." Guaranteeing a number won't change in electronics is surprisingly difficult. The universe itself constantly hits our electronics with bit-flipping cosmic rays. Fusion of hydrogen atoms in the sun could potentially transform our computational Jekyll into a murderous Hyde. This has already been observed in the most heavily engineered systems of today:

On 7 October 2008 a Qantas flight en route from Singapore to Australia, travelling at 11,300 m, suddenly pitched down, with 12 passengers seriously injured as a result. Investigators determined that the problem was due to a “single-event upset” (SEU) causing incorrect data to reach the electronic flight instrument system. The culprit, again, was most likely cosmic radiation. An SEU bit flip was also held responsible for errors in an electronic voting machine in Belgium in 2003 that added 4096 extra votes to one candidate. (source)

Engineering a "conscience," "ethical subroutine," or some other deus ex machina is a pipe dream. These sort of weird edge cases are everywhere in AI. So much so that imagining we can produce a god-machine that perpetually serves human interests is the pinnacle of hubris.

Any system with immense power will eventually produce great harms through mispecification, the evils of humans, or even cosmic rays flipping bits. Immense power is the absence of safety.

The silent film message remarking on the tenuousness of good versus evil that applies to intelligent systems.

What Would James Madison Do?

Can we make artificial intelligence systems without a capacity to harm? Can we make {airplanes, medications, nuclear power} without {crashes, side effects, radiation}? No. All these technologies make the world better, but present distinctive capacities to harm. Like radiation to Marie Curie, who famously died in the development of radiology, society is now stumbling through how to make intelligent systems safe. For technology rooted in making decisions, safety involves limiting power.

The common thought experiment explaining AI existential risk is that of a paperclip maximizer. In the hypothetical, a computer tasked with producing as many paperclips as possible eventually converts the entire universe into paperclips. Having spent years optimizing reinforcement learning agents, I can say this ridiculous thought experiment is ultimately a real concern. We are fantastically incapable of engineering systems that behave according to their designer's expectations. Much simpler systems than the god machine already produce a wide array of AI incidents in the world.

Knowing the potential benefits we can realize from intelligent systems, I am a proponent of continuing the development of artificial intelligence despite increasing risks. However, advancing the technology requires building a world where no human, country, or computer system has the capacity to pursue destructive objectives with computers. We can build a more robust society where no computers or individual humans have phenomenal cosmic power to do evil. As James Madison wrote when describing the rationale behind separating the powers of government between multiple entities:

We see it particularly displayed in all the subordinate distributions of power, where the constant aim is to divide and arrange the several offices in such a manner as that each may be a check on the other that the private interest of every individual may be a sentinel over the public rights. These inventions of prudence cannot be less requisite in the distribution of the supreme powers of the State. (source)

This division of power must be sought in the production of increasingly capable intelligent systems. Where humans divide power socially, the power of computer systems is partitioned by ensuring computer systems are not networked and controllable by one another. The trend towards constant connections and system updates is not consistent with reducing capacities for power. As state actors look to increasingly control their populations through technology (e.g., by surveilling the entire internet) we are building a far more frightening world of concentrated power.

A division of power is not what my colleagues in artificial intelligence are calling for when they advocate to control AI. A highly publicised joint statement reads, "Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable" (From "Pause Giant AI Experiments: An Open Letter"). The implication is that we just need time to figure out how to safely give machines power. This is a mistake. Time will not cure the "power problem." Immense power will never be managed. That is not how the universe works.

Provisioning power to machine systems is a marker of bad safety engineering. Machines do not need power. Machines should empower people to live and act in the world where no individual machine -- or human -- has the power to bring about unbounded harms.

We cannot engineer away the possibility of an artificial Hyde, but we can make a society safe for its duality.

Dr. Jekyll's failed attempt to produce a serum that might control his transformations. Ultimately, he failed to control the darker urges.

A warning from the silent film.

Addenda

  • Feasibility argument against the "pause." The de rigueur response to rapid advances in AI is to pause AI development by limiting the size of AI models. Limiting the size of models will do little to slow the advancement of AI capacities and it will likely leave us in a worse position to build a safer society. I do not endorse policy makers pausing AI development because (1) there are enough ways to advance AI that scale will not be a limiting factor over the next 3 years -- we can route around such hastily constructed limits, (2) larger scale models become more accessible to more people with every month that passes -- neural network compression techniques will continue to advance and support greater capacities at lesser scales, and finally (3) pausing advances in scale is an all-too-convenient way to avoid making the difficult choices that are necessary regardless of any potential pause. We more desperately need a raft of technical and policy advancements.
  • ECC Memory Yes, there is error correcting memory that is more robust to cosmic rays. However, I could come up with hundreds of more examples like these and the existence of error correcting memories does not mean they are widely deployed.