Over the last few months, you may have read the coverage surrounding an article, co-authored by Stephen Hawking, discussing the risks associated with artificial intelligence. The article suggested that AI may pose a serious risk to the human race. Hawking isn’t alone there — Elon Musk and Peter Thiel are both intellectual public figures who have expressed similar concerns (Thiel has invested more than $1.3 million researching the issue and possible solutions).
The coverage of Hawking’s article and Musk’s comments have been, not to put too fine a point on it, a little bit jovial. The tone has been very much ‘look at this weird thing all these geeks are worried about.’ Little consideration is given to the idea that if some of the smartest people on Earth are warning you that something could be very dangerous, it just might be worth listening.
This is understandable — artificial intelligence taking over the world certainly sounds very strange and implausible, maybe because of the enormous attention already given to this idea by science fiction writers. So, what has all these nominally sane, rational people so spooked?
What Is Intelligence?
In order to talk about the danger of Artificial Intelligence, it might be helpful to understand what intelligence is. In order to better understand the issue, let’s take a look at a toy AI architecture used by researchers who study the theory of reasoning. This toy AI is called AIXI , and has a number of useful properties. It’s goals can be arbitrary, it scales well with computing power, and its internal design is very clean and straightforward.
Furthermore, you can implement simple, practical versions of the architecture that can do things like play Pacman, if you want. AIXI is the product of an AI researcher named Marcus Hutter, arguably the foremost expert on algorithmic intelligence. That’s him talking in the video above.
AIXI is surprisingly simple: it has three core components: learner, planner, and utility function.
- The learner takes in strings of bits that correspond to input about the outside world, and searches through computer programs until it finds ones that produce its observations as output. These programs, together, allow it to make guesses about what the future will look like, simply by running each program forward and weighting the probability of the result by the length of the program (an implementation of Occam’s razor).
- The planner searches through possible actions that the agent could take, and uses the learner module to predict what would happen if it took each of them. It then rates them according to how good or bad the predicted outcomes are, and chooses the course of action that maximizes the goodness of the expected outcome multiplied by the expected probability of achieving it.
- The last module, the utility function, is a simple program that takes in a description of a future state of the world, and computes a utility score for it. This utility score is how good or bad that outcome is, and is used by the planner to evaluate future world state. The utility function can be arbitrary.
- Taken together, these three components form an optimizer, which optimizes for a particular goal, regardless of the world it finds itself in.
This simple model represents a basic definition of an intelligent agent. The agent studies its environment, builds models of it, and then uses those models to find the course of action that will maximize the odds of it getting what it wants. AIXI is similar in structure to an AI that plays chess, or other games with known rules — except that it is able to deduce the rules of the game by playing it, starting from zero knowledge.
AIXI, given enough time to compute, can learn to optimize any system for any goal, however complex. It is a generally intelligent algorithm. Note that this is not the same thing as having human-like intelligence (biologically-inspired AI is a ). In other words, AIXI may be able to outwit any human being at any intellectual task (given enough computing power), but it might not be conscious of its victory.
As a practical AI, AIXI has a lot of problems. First, it has no way to find those programs that produce the output it’s interested in. It’s a brute-force algorithm, which means that it is not practical if you don’t happen to have an arbitrarily powerful computer lying around. Any actual implementation of AIXI is by necessity an approximation, and (today) generally a fairly crude one. Still, AIXI gives us a theoretical glimpse of what a powerful artificial intelligence might look like, and how it might reason.
The Space of Values
If , you know that computers are obnoxiously, pedantically, and mechanically literal. The machine does not know or care what you want it to do: it does only what it has been told. This is an important notion when talking about machine intelligence.
With this in mind, imagine that you have invented a powerful artificial intelligence – you’ve come up with clever algorithms for generating hypotheses that match your data, and for generating good candidate plans. Your AI can solve general problems, and can do so efficiently on modern computer hardware.
Now it’s time to pick a utility function, which will determine what the AI values. What should you ask it to value? Remember, the machine will be obnoxiously, pedantically literal about whatever function you ask it to maximize, and will never stop – there is no ghost in the machine that will ever ‘wake up’ and decide to change its utility function, regardless of how many efficiency improvements it makes to its own reasoning.
Eliezer Yudkowsky put it this way:
As in all computer programming, the fundamental challenge and essential difficulty of AGI is that if we write the wrong code, the AI will not automatically look over our code, mark off the mistakes, figure out what we really meant to say, and do that instead. Non-programmers sometimes imagine an AGI, or computer programs in general, as being analogous to a servant who follows orders unquestioningly. But it is not that the AI is absolutely obedient to its code; rather, the AI simply is the code.
If you are trying to operate a factory, and you tell the machine to value making paperclips, and then give it control of bunch of factory robots, you might return the next day to find that it has run out of every other form of feedstock, killed all of your employees, and made paperclips out of their remains. If, in an attempt to right your wrong, you reprogram the machine to simply make everyone happy, you may return the next day to find it putting wires into peoples’ brains.
The point here is that humans have a lot of complicated values that we assume are shared implicitly with other minds. We value money, but we value human life more. We want to be happy, but we don’t necessarily want to put wires in our brains to do it. We don’t feel the need to clarify these things when we’re giving instructions to other human beings. You cannot make these sorts of assumptions, however, when you are designing the utility function of a machine. The best solutions under the soulless math of a simple utility function are often solutions that human beings would nix for being morally horrifying.
Allowing an intelligent machine to maximize a naive utility function will almost always be catastrophic. As Oxford philosopher Nick Bostom puts it,
We cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans—scientific curiosity, benevolent concern for others, spiritual enlightenment and contemplation, renunciation of material acquisitiveness, a taste for refined culture or for the simple pleasures in life, humility and selflessness, and so forth.
To make matters worse, it’s very, very difficult to specify the complete and detailed list of everything that people value. There are a lot of facets to the question, and forgetting even a single one is potentially catastrophic. Even among those we’re aware of, there are subtleties and complexities that make it difficult to write them down as clean systems of equations that we can give to a machine as a utility function.
Some people, upon reading this, conclude that building AIs with utility functions is a terrible idea, and we should just design them differently. Here, there is also bad news — you can prove, formally, that any agent that doesn’t have something equivalent to a utility function can’t have coherent preferences about the future.
One solution to the above dilemma is to not give AI agents the opportunity to hurt people: give them only the resources they need to solve the problem in the way you intend it to be solved, supervise them closely, and keep them away from opportunities to do great harm. Unfortunately, our ability to control intelligent machines is highly suspect.
Even if they’re not much smarter than we are, the possibility exists for the machine to “bootstrap” — collect better hardware or make improvements to its own code that makes it even smarter. This could allow a machine to leapfrog human intelligence by many orders of magnitude, outsmarting humans in the same sense that humans outsmart cats. This scenario was first proposed by a man named I. J. Good, who worked on the Cryptanalysis of the Enigma-analysis project with Alan Turing during World War II. He called it an “Intelligence Explosion,“ and described the matter like this:
Let an an ultra-intelligent machine be defined as a machine that can far surpass all the intellectual activities of any man, however clever. Since the design of machines is one of these intellectual activities, an ultra-intelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind. Thus, the first ultra-intelligent machine is the last invention that man need ever make, provided that the machine is docile enough.
It’s not guaranteed that an intelligence explosion is possible in our universe, but it does seem likely. As time goes on, computers get faster and basic insights about intelligence build up. This means that the resource requirement to make that last jump to a general, boostrapping intelligence drop lower and lower. At some point, we’ll find ourselves in a world in which millions of people can drive to a Best Buy and pick up the hardware and technical literature they need to build a self-improving artificial intelligence, which we’ve already established may be very dangerous. Imagine a world in which you could make atom bombs out of sticks and rocks. That’s the sort of future we’re discussing.
And, if a machine does make that jump, it could very quickly outstrip the human species in terms of intellectual productivity, solving problems that a billion humans can’t solve, in the same way that humans can solve problems that a billion cats can’t.
It could develop powerful robots (or bio or nanotechnology) and relatively rapidly gain the ability to reshape the world as it pleases, and there would be very little we could do about it. Such an intelligence could strip the Earth and the rest of the solar system for spare parts without much trouble, on its way to doing whatever we told it to. It seems likely that such a development would be catastrophic for humanity. An artificial intelligence doesn’t have to be malicious to destroy the world, merely catastrophically indifferent.
As the saying goes, “The machine does not love or hate you, but you are made of atoms it can use for other things.”
Risk Assessment and Mitigation
So, if we accept that designing a powerful artificial intelligence that maximizes a simple utility function is bad, how much trouble are we really in? How long have we got before it becomes possible to build those sorts of machines? It is, of course, difficult to tell.
Artificial intelligence developers are The machines we build and the problems they can solve have been growing steadily in scope. In 1997, Deep Blue could play chess at a level greater than a human grandmaster. In 2011, IBM’s Watson could read and synthesize enough information deeply and rapidly enough to beat the best human players at an open-ended question and answer game riddled with puns and wordplay – that’s a lot of progress in fourteen years.
Right now, Google is investing heavily into researching deep learning, a technique that allows the construction of powerful neural networks by building chains of simpler neural networks. That investment is allowing it to make serious progress in speech and image recognition. Their most recent acquisition in the area is a Deep Learning startup called Deep Mind, for which they paid approximately $400 million. As part of the terms of the deal, Google agreed to create an ethics board to ensure that their AI technology is developed safely.
At the same time, IBM is developing Watson 2.0 and 3.0, systems that are capable of processing images and video and arguing to defend conclusions. They gave a simple, early demo of Watson’s ability to synthesize arguments for and against a topic in the video demo below. The results are imperfect, but an impressive step regardless.
None of these technologies are themselves dangerous right now: artificial intelligence as a field is still struggling to match abilities mastered by young children. Computer programming and AI design is a very difficult, high-level cognitive skill, and will likely be the last human task that machines become proficient at. Before we get to that point, we’ll also have ubiquitous machines , practice medicine and law
, and probably other things as well, with profound economic consequences.
The time it’ll take us to get to the inflection point of self-improvement just depends on how fast we have good ideas. Forecasting technological advancements of those kinds are notoriously hard. It doesn’t seem unreasonable that we might be able to build strong AI in twenty years’ time, or sooner. Either way, it will happen eventually, and there’s reason to believe that when it does happen, it will be extremely dangerous.
So, if we accept that this is going to be a problem, what can we do about it? The answer is to make sure that the first intelligent machines are safe, so that they can bootstrap up to a significant level of intelligence, and then protect us from unsafe machines made later. This ‘safeness’ is defined by sharing human values, and being willing to protect and help humanity.
Because we can’t actually sit down and program human values into the machine, it’ll probably be necessary to design a utility function that requires the machine to observe humans, deduce our values, and then try to maximize them. In order to make this process of development safe, it may also be useful to develop artificial intelligences that are specifically designed not to have preferences about their utility functions, allowing us to correct them or turn them off without resistance if they start to go astray during development.
MIRI is interested specifically in developing the math needed to build Friendly AI. If it turns out that bootstrapping artificial intelligence is possible, then developing this kind of ‘Friendly AI’ technology first, if successful, may wind up being the single most important thing humans have ever done.
Those who think that science fiction is just lunatic ravings, that most of what today is commonplace, was forecast, foretold and predicted by science fiction writers such as Verne, Clarke, Wells, Asimov, Heinlein, and others.
Do you think artificial intelligence is dangerous? Are you concerned about what the future of AI might bring? Share your thoughts in the comments section below!