How should we respond to claims that forthcoming new versions of AI pose unacceptable risks of human catastrophe?
These slides were presented at a London Futurists webinar on 16th March 2024, by David Wood, chair of the organisation. They are an updated version of a presentation he originally shared at the recent BGI unconference in Panama City, https://bgi24.ai/. That talk was described by a number of participants as "the best of the entire event", but others said that it was "a mistake that so much time was given to this subject".
Ahead of this webinar, the talk given in Panama was significantly revised in light of feedback received at the BGI unconference.
The talk aims to improve understanding of which risks are the most credible and serious (as opposed to fanciful or unfounded). It also reviews a variety of options for how to respond to these risks. This includes varieties of so-called "accelerationism" and "singularity activism".
For more about the event, see https://www.meetup.com/london-futurists/events/299727151/.
For a recording of the event, see https://www.youtube.com/watch?v=sYK4eYTZmXU
6. @dw2 Page 6
AI image recognition: hits and misses
https://gradientscience.org/intro_adversarial/
+0.005 x
“pig” “airliner”
AI sometimes makes mistakes that are “alien” – very different to human mistakes
8. @dw2 Page 8
Page 8
“Google accused of directing motorist to drive off collapsed bridge”
https://www.bbc.co.uk/news/world-us-canada-66873982, 22nd Sept 2023, Philip Paxson, Hickory, North Carolina
Human vandals had recently damaged some warning signs
Bad human behaviour + Bad AI implementation -> Catastrophe
Misguided humans + Misguided AI -> Catastrophe
9. @dw2 Page 9
AI technology that deeply exploits human psychology?
AI technology designed to
make money for social
media platforms by
keeping users engaged
Molly
Russell
10. @dw2 Page 10
AI technology that deeply exploits human psychology?
Rohingya refugees in refugee camp in
Bangladesh, 2017
“The military and the local Rakhine
population killed at least 25,000 Rohingya
people and perpetrated gang rapes and
other forms of sexual violence against
18,000 Rohingya women and girls.
They estimated that 116,000 Rohingya were
beaten, and 36,000 were thrown into fires”
en.wikipedia.org/wiki/Rohingya_genocide
11. @dw2 Page 11
Lieutenant-Colonel Stanislav Petrov
https://en.wikipedia.org/wiki/Stanislav_Petrov
Yuri Andropov, USSR Premier, Nov 1982 to Feb 1984
KAL 007
1 Sept 1983
Shot down by
Soviet missile
All 269 killed
Including
member of US
House of
Representatives
Ronald Reagan:
“The Korean Air
Massacre”
26 Sept 1983
Alarm system indicated
incoming US missile(s)
Protocol dictated that Petrov
urgently inform his superiors
Petrov declined to follow orders
World
Citizen
Award
“The
Man
Who
Saved
The
World”
Future
of Life
Award
13. @dw2 Page 13
Lion Air Flight 610
Domestic flight inside Indonesia
29 October 2018
189 people on board
Ethiopian Airlines Flight 302
Addis Ababa, Ethiopia to Nairobi, Kenya
10 March 2019
157 people on board
Both flights used Boeing 737 Max aircraft
A (very safe) Boeing 737 design, pushed to the “max”
Airplane could become unstable in some circumstances
Hence introduced MCAS: Maneuvering Characteristics Augmentation System (AI)
Automatically push down the airplane nose in some emergency(?) situations
Pilots could in theory override this, but needed specialist training (skipped)
Jan 2021: Boeing paid fines of over $2.5 billion after being charged with fraud
Responding to competitive
pressure from Airbus
Victims of deteriorating corporate culture
deteriorating societal culture
14. @dw2 Page 14
Bhopal, India, 2 December 1984
“Accidental” release of 30 tons of a highly toxic chemical gas (methyl isocyanate)
2,259 deaths in short term, up to 14,000 more later, numerous birth defects
Safety systems in disrepair; inadequate training of staff in safety processes
Previous leaks not fully investigated; internal audit warning report not followed up
Company management had little long-term interest in the plant
“The World’s Worst Industrial Disaster”
www.theatlantic.com/photo/2014/12/bhopal-the-worlds-worst-industrial-disaster-30-years-later/100864/
Management
blamed
sabotage from
disgruntled
employees
Canary signal
Victims of deteriorating corporate culture
15. @dw2 Page 15
“The World’s Worst Ransomware Disaster?”
WannaCry – May 2017, devasted NHS hospitals throughout the UK
Seemingly earned the North Koreans very little actual money
Ransomware incompletely understood, out of control…
“Bad guys” re-used hacking tools developed by some “good guys”
Disgruntled
nation state
Incompetent
managers
16. @dw2 Page 16
“The religion for the elite” Disgruntled cult
Shoko Asahara
(1955-2018)
Founder and leader of
Aum Shinrikyo
20 March 1995
Bombs including sarin gas were exploded on five different Tokyo subway trains
13 people killed, 50 others severely injured (some of whom later died)
https://en.wikipedia.org/wiki/Tokyo_subway_sarin_attack
The group also assembled:
• Traditional explosives
• Chemical weapons
• A Russian military helicopter
• Hydrogen cyanide poison
• Samples of Ebola
• Samples of anthrax
Motivation
+ Technology
+ Knowledge
+ Vulnerability
= Catastrophe
AI++?
++?
17. @dw2 Page 17
Three objections to this narrative
Real-world disasters typically have
multiple overlapping causes
It’s the combination of human failures
and tech failures that could cause the
biggest catastrophes
Don’t just consider “Normal distributions”
Consider situations involving fat tails
Sudden mass extinctions / tipping points
Particularly deadly pandemics
The first nuclear explosions…
Consider potential exponential escalation
Yes, but: proceed with care!
Sometimes “solutions” make things worse
Novel tech failures (e.g. AI) can trigger
unexpected complications
The problem is with humans, not AI (?)
1
AI is the solution, not the problem (?)
2
These all just show minor catastrophes (?)
3
AI-in-a-rush is not the solution
18. @dw2 Page 18
Leslie Groves: Are you saying that there’s a chance that when we push that button... we destroy the world?
J. Robert Oppenheimer: The chances are near zero...
Groves: Near zero?
Oppenheimer: What do you want from theory alone?
Groves: Zero would be nice!
“Oppenheimer’s ‘NEAR ZERO’ probability, Explained!” - https://www.youtube.com/watch?v=wx1DkmIdKLI
19. @dw2 Page 19
Calculating consequences can be hard
• First hydrogen bomb test, 1st March 1954, Bikini Atoll
‒ Explosive yield was expected to be from 4 to 6 Megatons
‒ Was 15 Megatons, two and a half times
the expected maximum
‒ “Physics error” by the designers at Los
Alamos National Lab
‒ Wrongly considered the lithium-7 isotope
to be inert in bomb
‒ The crew in a nearby Japanese fishing boat
became ill in the wake of direct contact
with the fallout. One of the crew died
http://en.wikipedia.org/wiki/Castle_Bravo
20. AI catastrophic risk
Cascading catastrophe
Flawed human
reasoning
Flawed human
emotions
+
Flawed social
systems
+
Existentially
powerful AI
+
Fragile
infrastructure
+
This is changing
much more quickly
We can slow down the
most dangerous aspects
We can learn to harness
the best outcomes
Nukes, biofailure, geoengineering failure, hatred…
Improve!?
Improve!?
Improve!?
Improve!?
Singularity
Activism
24. @dw2 Page 24
Upton Sinclair
1935
It is difficult to get
a man to understand
something, when
his salary depends
on his not
understanding it
https://libraries.indiana.edu/lilly-library/upton-sinclair
Salary?
Ideology?
Worldview?
Identity?
Tribal status?
We are a rationalizing species
at least as much as a rational one
25. @dw2 Page 25
Primary beliefs:
• World government would be awful
• Open source must be preserved
• We’ll all die of aging soon, without AGI
Assumptions:
• AI regulations would imply world government
• AI regulations would kill open source
• AGI is the best solution to various x-risks
Conclusion(?!):
• There must be no real risk of AI catastrophe
26. @dw2 Page 26
Primary beliefs:
• World government would be awful
• Open source must be preserved
• We’ll all die of aging soon, without AGI
Assumptions:
• AI regulations would imply world government
• AI regulations would kill open source
• AGI is the best solution to various x-risks
Conclusion(?!):
• There must be no real risk of AI catastrophe
Consensual safe AI is possible
Consensual safe AI is vital
27. @dw2 Page 27
Consensual safe AI is possible
Consensual safe AI is vital
Catastrophic AI risks are by no means science fiction
Catastrophic AI risks arise straightforwardly from
assumptions that everyone shares
There are solutions that are technically possible
There are solutions that are possible politically
and geo-politically, and that respect human values
28. @dw2 Page 28
The narrow corridor
Social wellbeing faces threats from powerful groups:
• Big Armaments
• Big Tobacco
• Big Oil
• Big Finance
• Big Crime
• Big Theology
• Big Media
• Big Money
The state needs power to control these potential
powerful cancers: Big State
Society needs power to control the state!
• Independent media
• Independent judiciary
• Independent academia
• Independent opposition parties
The separation of powers!
• Checks and balances
30. @dw2 Page 30
Verifiably safe AI?!
AI is an inscrutable black box!?
• Distinguish the inner workings (inscrutable)
from the individual recommendations (testable)
• Compare: finding a solution to a puzzle, to verifying that solution
• Test critical recommendations in a safe environment (emulation) first
But AI will never consent to being confined!?
• That depends on whether the AI (tool) has volition / agency / sentience
• Hence importance to study of the possible emergence of volition etc
But risky AI might be developed below the radar!?
• So build-in tamper-proof remote shutdown capabilities
• (We don’t yet know how/whether this will be possible…)
• But the scale of these risks makes these investigations vital
Take back
control of
technology!
Adv AI –
Adv AI +
31. Reducing risks of catastrophic harm from Adv AI
Default path
Adv AI potentially
disinterested in
or hostile to
human wellbeing
Redesigning the context for the development of Adv AI
Engaging education for all: spreading acute awareness of Adv AI risks and possibilities
Transforming mental dispositions worldwide: compassion, openness, humble ambition
Re-engineering incentives: bonuses for public goods, stronger infrastructure, monitoring, sanctions
Help from trusted narrow AIs: review & suggest improvements in designs & implementations
Two possible design choices to depart from the default dangerous path for the creation of Advanced AI
In practice a combination of Adv AI+
and Adv AI– may prove best
Both design choices are likely very difficult and will require considerable analysis. To avoid other (dangerous)
designs getting to Adv AI first, the redesigned context is an essential part of the overall recommendation
Adv AI –
Avoid the inclusion or
subsequent acquisition of
features that would make
the Adv AI truly dangerous
E.g. autonomous will,
fully general reasoning (?)
?
Adv AI +
Design in extra features that
will be preserved through
all subsequent evolution
E.g. benevolence,
compassion, superwisdom
alongside superintelligence
?
33. @dw2 Page 33
Success modes
Failure modes
Desire to build superintelligence as fast as
possible (“accelerate regardless”)
Being pulled into a “Moloch” race
(“accelerate despite our better intentions”)
Doom-mongering: adverse psychology
Distraction by concerns of simpler, more
immediate failures of AI systems
Virtue signaling (talk without action)
Inflexible, unhelpful, heavy legislation
Not listening to key insights (closed minds)
Discussions without data (ideology first)
Spread deep understanding of the risks
associated with superintelligence
Establish higher-level incentives that
reward and penalize appropriately
Strengthen vision of good outcomes too
Assign sufficient time (and respect) to
consider all varieties of AI failures
Virtue action (step-by-step)
Adaptive, agile, lean legislation
Embrace diverse creativity and criticism
Gather data and analyze it (with AI help!)
https://magazine.mindplex.ai/cautionary-tales-and-a-ray-of-hope/
“Cautionary Tales And A Ray Of Hope: 4 scenarios for the transition to AGI”
34. @dw2 Page 34
Agreeing a path forward – transcendent goals
Define and pay attention to canary signals (wake-up calls)
• Beware the distractions of (e.g.) partisan tribalism, political correctness
• Deepen our understanding of which landmines need most care
Humanity should protect our most vulnerable infrastructure
(against action from dangerous humans and/or dangerous AI)
• Access to nuclear weapons
• Access to creation/jailbreaking of especially dangerous biopathogens
• The IT infrastructure on which we all depend
• The health of the environment
Protect human lives (the most basic of human rights)
• Enable all-round health and flourishing (not just GDP)
Prioritize differential development (steering plus acceleration)
• Mechanisms for safety, auditing, disabling, and building trust
35. @dw2 Page 35
Think harder about the consequences in advance
Monitor closely once deployed, ready to intervene
Question desirability
Clarify externalities
Require peer reviews
Involve multiple perspectives
Analyse the whole system
Anticipate fat tails
Reject opacity
Promote resilience
Promote verifiability
Promote auditability
Clarify risks to users
Clarify trade-offs
Insist on accountability
Penalise disinformation
Design for cooperation
Analyse via simulations
Maintain human oversight
Build consensus
regarding principles
Provide incentives
to address omissions
Halt development
if principles not upheld
Consolidate progress
via legal frameworks