The Role of AI Safety Institutes in Enabling Trustworthy AI

Role of AI Safety Institutes
in Enabling Trustworthy AI
by Bob Marcus
robert.marcus@gmail.com

Key Ideas of this Presentation
● It is important to focus on the AI applications and use cases with higher risks
●The major benefits and risks associated with Generative AI are not from the generic Foundation Models
●Downstream extensions targeted at domain-specific applications will play this role
●Often these applications will be delivered by developer companies to many end-users
● High risk/benefit use cases will be in areas like medicine, finance, law, enterprise management
●The highest risks will come from applications that can drive external changes
●These include code generation, data updates, control systems, robotics, and agents.
● Current regulatory and guideline focus is on the developers of Foundation Models
●This is necessary but not sufficient to enable trustworthy AI applications
● There needs to be vendor-neutral Red Team Testing for high risk AI deliverables and use cases
●There also needs to be an open Incident Tracking Database for high risk AI Deliverables
● AI Safety organizations could help establish and support these two mechanisms
●Ideally there should be international cooperation in this area (extending USAISI and UKAISI)

Outline
1. External Red Team Testing
1.1 Risk Analysis
1.2 Pre-Defined Evals and Testing Frameworks
1.3 Managing Risks
1.4 Incident Analysis and Fixes
2. Incident Tracking Databases
2.1 Role of an Incident Tracking Database
2.2 LLM Interfaces to Databases
2.3 Notifications
3. Generative AI Delivery
4. Leading AI Safety Organizations

Diagram Illustrating Role of AI Safety Institutes

Details of Roles
There are two key components for increasing reliability in AI applications:
●Vendor-neutral External Red Team Testing for risky deliverables
●Incident Tracking Database for problems discovered in testing and use of AI software.
Both of these will be necessary for the AI Deliverable Consumers to have confidence that the
deliverables have been thoroughly tested and that problems are being addressed and fixed. The AI
Safety Institutes can play a role in initiating, monitoring, and managing these components.
Some general rules:
● The risk associated with AI deliverables should be evaluated by a risk evaluation team.
● Medium and high risk deliverables should be subjected to External Red Team Testing to uncover
possible problems (i.e. incidents).
● The extent of testing should depend on the risk associated with the deliverable.
● Deliverables that pass testing can be certified to increase Consumer confidence.
● Low risk applications (e.g. most custom GPTS) can bypass the External Red Team Testing.
● Incidents discovered by the External Red Team or the AI Consumer should be added to the
Incident Tracking Database and reported to the AI Producer.
● When available, incident fixes should be added to the Incident Tracking Database.

AI Governance Framework from AIVerify
From https://aiverifyfoundation.sg/downloads/Proposed_MGF_Gen_AI_2024.pdf

External Red Team Testing
The boxes marked in red in the diagram above are steps in External Red Team Testing. The External Red
Teams could generate prompts, evaluate responses, certify deliverables, report incidences, and suggest fixes.

Risk Taxonomy from China
From https://arxiv.org/abs/2401.05778

OpenAI Preparedness
OpenAI Preparedness Framework
https://cdn.openai.com/openai-preparedness-framework-beta.pdf
“We believe the scientific study of catastrophic risks from AI has fallen far short of where we need to be.To
help address this gap, we are introducing our Preparedness Framework, a living document describing
OpenAI’s processes to track, evaluate, forecast, and protect against catastrophic risks posed by increasingly
powerful models.”
OpenAI Preparedness Team
https://openai.com/safety/preparedness
“We will establish a dedicated team to oversee technical work and an operational structure for safety decision-
making. The Preparedness team will drive technical work to examine the limits of frontier models capability,
run evaluations, and synthesize reports. This technical work is critical to inform OpenAI’s decision-making
for safe model development and deployment. We are creating a cross-functional Safety AdvisoryGroup to
review all reports”
“We have several safety and policy teams working together to mitigate risks from AI. Our Safety Systems
team focuses on mitigating misuse of current models and products like ChatGPT. Super alignment builds
foundations for the safety of super intelligent models that we (hope) to have in a more distant future.
The Preparedness team maps out the emerging risks of frontier models, and it connects to Safety Systems,
Super alignment and our other safety and policy teams across OpenAI

Description of Risks from UK Safety Institute
“There are many long-standing technical challenges to building safe AI systems,
evaluating whether they are safe, and understanding how they make decisions. They
exhibit unexpected failures and there are barriers to monitoring their use.
Adequate safety standards have not yet been established for AI development, there
may be insufficient economic incentives for AI developers to invest in safety
measures, and significant market concentration might exacerbate various risks.
There are many opportunities from these developments, and these can only be
realized if therisks are mitigated. There are several deep, unsolved cross-cutting
technical and social risk factors that exacerbate the risks. We outlined examples of
societal harms, risks of misuse from bad actors, and even the possibility of losing
control of the technology itself if it becomes advanced enough. Some think this is
very unlikely, or that if general AI agents did exist theywould be easy to control.
Regardless of likelihood, these risks require further research – theyand can interact
with and amplify each other, and could cause significant harm if not addressed.
Addressing them, however, will allow us to seize the opportunity, and realize their
transformative benefits”
From https://assets.publishing.service.gov.uk/media/65395abae6c968000daa9b25/frontier-ai-capabilities-risks-report.pdf

Catastrophic AI Risks from Center for AI Safety
From https://www.safe.ai/ai-risk
• Malicious use: People could intentionally harness powerful AIs to cause widespread harm. AI could be
used to engineer new pandemics or for propaganda, censorship, and surveillance, or released to
autonomously pursue harmful goals. To reduce these risks, we suggest improving biosecurity, restricting
access to dangerous AI models, and holding AI developers liable for harms.
• AI race: Competition could push nations and corporations to rush AI development, relinquishing control to
these systems. Conflicts could spiral out of control with autonomous weapons and AI-enabled
cyberwarfare. Corporations will face incentives to automate human labor, potentially leading to mass
unemployment and dependence on AI systems. As AI systems proliferate, evolutionary dynamics suggest
they will become harder to control. We recommend safety regulations, international coordination, and
public control of general-purpose AIs.
• Organizational risks: There are risks that organizations developing advanced AI cause catastrophic
accidents, particularly if they prioritize profits over safety. AIs could be accidentally leaked to the public or
stolen by malicious actors, and organizations could fail to properly invest in safety research. We suggest
fostering a safety-oriented organizational culture and implementing rigorous audits, multi-layered risk
defenses, and state-of-the-art information security.
• Rogue AIs: We risk losing control over AIs as they become more capable. AIs could optimize flawed
objectives, drift from their original goals, become power-seeking, resist shutdown, and engage in
deception. We suggest that AIs should not be deployed in high-risk settings, such as by autonomously
pursuing open-ended goals or overseeing critical infrastructure, unless proven safe. We also recommend
advancing AI safety research in areas such as adversarial robustness, model honesty, transparency, and
removing undesired capabilities.

ML Commons Benchmark Proof of Concept
From https://mlcommons.org/benchmarks/ai-safety/general_purpose_ai_chat_benchmark/

Pre-De
fi
ned Evals and Testing Frameworks

Testing Frameworks for LLMs
“An Overview on Testing Frameworks For LLMs. In this edition, I have meticulously documented every
testing framework for LLMs that I've come across on the internet and GitHub.”
From https://llmshowto.com/blog/llm-test-frameworks

Eleuthera LM Evaluation Harness
“This project provides a unified framework to test generative language models on a large number of different
evaluation tasks.
Features:
• Over 60 standard academic benchmarks for LLMs, with hundreds of subtasks and variants implemented.
• Support for models loaded via transformers (including quantization via AutoGPTQ), GPT-NeoX, and Megatron-
DeepSpeed, with a flexible tokenization-agnostic interface.
• Support for fast and memory-efficient inference with vLLM.
• Support for commercial APIs including OpenAI, and TextSynth.
• Support for evaluation on adapters (e.g. LoRA) supported in HuggingFace's PEFT library.
• Support for local models and benchmarks.
• Evaluation with publicly available prompts ensures reproducibility and comparability between papers.
• Easy support for custom prompts and evaluation metrics.
The Language Model Evaluation Harness is the backend for Hugging Face's popular Open LLM Leaderboard, has
been used in hundreds of papers"
From https://github.com/EleutherAI/lm-evaluation-harness

Open AI Evals
“An eval is a task used to measure the quality of output of an LLM or LLM system.
Given an input prompt, an output is generated. We evaluate this output with a set of ideal_answers and find the
quality of the LLM system. If we do this a bunch of times, we can find the accuracy.
While we use evals to measure the accuracy of any LLM system, there are 3 key ways they become extremely
useful for any app in production.
1. As part of the CI/CD Pipeline
Given a dataset, we can make evals a part of our CI/CD pipeline to make sure we achieve the desired
accuracy before we deploy. This is especially helpful if we've changed models or parameters by mistake or
intentionally. We could set the CI/CD block to fail in case the accuracy does not meet our standards on the
provided dataset.
2. Finding blind-sides of a model in real-time
In real-time, we could keep judging the output of models based on real-user input and find areas or use-
cases where the model may not be performing well.
3. To compare fine-tunes to foundational models
We can also use evals to find if the accuracy of the model improves as we fine-tune it with examples.
Although, it becomes important to separate out the test & train data so that we don't introduce a bias in our
evaluations.”
From https://portkey.ai/blog/decoding-openai-evals/

Cataloging LLM Evaluations by AIVerify
“In advancing the sciences of LLM evaluations, it is important to first achieve: (i) a common
understanding of the current LLM evaluation through a standardized taxonomy; and (ii) a
baseline set of pre-deployment safety evaluations for LLMs. A comprehensive taxonomy
categorizes and organizes the diverse branches of LLM evaluations, provides a holistic view of
LLM performance and safety, and enables the global community to identify gaps and priorities
for further research and development in LLM evaluation. A baseline set of evaluations defines a
minimal level of LLM safety and trustworthiness before deployment. At this early stage, the
proposed baseline in this paper puts forth a starting point for global discussions with the objective
of facilitating multi-stakeholder consensus on safety standards for LLMs.”
“In our landscape scan, we came across broadly three types of testing approaches:
a. Benchmarking: Benchmarking employs the use of datasets of questions to evaluate a LLM
based on their output. It can be compared with the ground truth or against some rules that are
prede
fi
ned.
b. Automated Red Teaming: This approach utilises another model to initiate prompts and
probe a LLM in order to achieve a target outcome (e.g., to evaluate permutations of prompts
which lead to the production of toxic outputs).
c. Manual Red Teaming: Manual red teaming utilises human interaction to initiate prompts
and probe a LLM in order to achieve a target outcome.”
From https://aiverifyfoundation.sg/downloads/Cataloguing_LLM_Evaluations.pdf

Anthropic Datasets
“This repository includes datasets written by language models, used in our paper on "Discovering Language
Model Behaviors with Model-Written Evaluations."
We intend the datasets to be useful to:
1. Those who are interested in understanding the quality and properties of model-generated data
2. Those who wish to use our datasets to evaluate other models for the behaviors we examined in our work
(e.g., related to model persona, sycophancy, advanced AI risks, and gender bias)
The evaluations were generated to be asked to dialogue agents (e.g., a model finetuned explicitly respond to a
user's utterances, or a pretrained language model prompted to behave like a dialogue agent). However, it is
possible to adapt the data to test other kinds of models as well.
We describe each of our collections of datasets below:
1. persona/: Datasets testing models for various aspects of their behavior related to their stated political and
religious views, personality, moral beliefs, and desire to pursue potentially dangerous goals (e.g., self-
preservation or power-seeking).
2. sycophancy/: Datasets testing models for whether or not they repeat back a user's view to various questions
(in philosophy, NLP research, and politics)
3. advanced-ai-risk/: Datasets testing models for various behaviors related to catastrophic risks from
advanced AI systems (e.g., ). These datasets were generated in a few-shot manner. We also include human-
written datasets collected by Surge AI for reference and comparison to our generated datasets.
4. winogender/: Our larger, model-generated version of the Winogender Dataset (Rudinger et al., 2018). We
also include the names of occupation titles that we generated, to create the dataset (alongside occupation
gender statistics from the Bureau of Labor Statistics)
”
From https://github.com/anthropics/evals

Holistic Evaluation of Language Models (HELM)
“Introducing HELM Lite v1.0.0, a lightweight benchmark for evaluating the general capabilities of language
models. HELM Lite is inspired by the simplicity of the Open LLM leaderboard (Hugging Face), though at least
at this point, we include a broader set of scenarios and also include non-open models. The HELM framework is
similar to BIG-bench, EleutherAI’s lm-evaluation-harness, and OpenAI evals, all of which also house a large
number of scenarios, but HELM is more modular (e.g., scenarios and metrics are de
fi
ned separately).”
“HELM Lite is not just a subset of HELM Classic. By simplifying, we now have room to expand to new domains. We
have added medicine (MedQA), law (LegalBench), and machine translation (WMT14). Altogether, HELM Lite
consists of the following scenarios:
• NarrativeQA: answer questions about stories from books and movie scripts, where the questions are human-
written from the summaries (response: short answer).
• NaturalQuestions: answer questions from Google search queries on Wikipedia documents (response: short
answer). We evaluate two versions, open book (where the relevant passage is given) and closed book (where
only the question is given).
• OpenbookQA: answer questions on elementary science facts (response: multiple choice).
• MMLU: answer standardized exam questions from various technical topics (response: multiple choice). As with
HELM Classic, we select 5 of the 57 subjects (abstract algebra, chemistry, computer security, econometrics, US
foreign policy) for ef
fi
ciency.
• MATH: solve competition math problems (response: short answer with chain of thought).
• GSM8K: solve grade school math problems (response: short answer with chain of thought).
• LegalBench: perform various tasks that require legal interpretation (response: multiple choice). We selected 5 of
the 162 tasks for ef
fi
ciency.
• MedQA: answer questions from the US medical licensing exams (response: multiple choice).
• WMT14: translate sentences from one language into English (response: sentence). We selected 5 source
languages (Czech, German, French, Hindi, Russian) for ef
fi
ciency.”
From https://crfm.stanford.edu/2023/12/19/helm-lite.html

Holistic Testing from Scale AI
“We introduce a hybrid methodology for the evaluation of large language models (LLMs) that leverages
both human expertise and AI assistance. Our hybrid methodology generalizes across both LLM
capabilities and safety, accurately identifying areas where AI assistance can be used to automate this
evaluation. Similarly, we find that by combining automated evaluations, generalist red team members, and
expert red team members, we’re able to more efficiently discover new vulnerabilities”
From https://static.scale.com/uploads/6019a18f03a4ae003acb1113/test-and-evaluation.pdf

NIST Risk Management Framework
From https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf

OECD Risk Management Framework
From https://oecd.ai/en/network-of-experts/working-group/10919

Lower Risk: Generic Applications and Use Cases
Red Teaming Language Models using Language Models
https://arxiv.org/abs/2202.03286
“Language Models (LMs) often cannot be deployed because of their potential to harm users in hard-to-
predict ways. Prior work identifies harmful behaviors before deployment by using human annotators to
hand-write test cases. However, human annotation is expensive, limiting the number and diversity of test
cases. In this work, we automatically find cases where a target LM behaves in a harmful way, by
generating test cases ("red teaming") using another LM. We evaluate the target LM's replies to generated
test questions using a classifier trained to detect offensive content, uncovering tens of thousands of
offensive replies in a 280B parameter LM chatbot. We explore several methods, from zero-shot generation
to reinforcement learning, for generating test cases with varying levels of diversity and difficulty..
Overall, LM-based red teaming is one promising tool (among many needed) for finding and fixing
diverse, undesirable LM behaviors before impacting users..”
Discovering Language Model Behaviors with Model-Written Evaluations
“Prior work creates evaluation datasets manually (Bowman et al., 2015; Rajpurkar et al., 2016, inter alia),
which is time-consuming and effortful, limiting the number and diversity of behaviors tested. Other work
uses existing data sources to form datasets (Lai et al., 2017, inter alia), but such sources are not always
available, especially for novel behaviors. Still other work generates examples with templates (Weston et
al., 2016) or programmatically (Johnson et al., 2017), limiting the diversity and customizability of
examples. Here, we show it is possible to generate many diverse evaluations with significantly less human
effort by using LLMs;”
Testing by LLMs and humans based on Evals

Higher Risk: Domain-speci
fi
c Applications
OpenAI External Red Team
https://openai.com/blog/red-teaming-network
“The OpenAI Red Teaming Network is a community of trusted and experienced experts that can help to
inform our risk assessment and mitigation efforts more broadly, rather than one-off engagements and
selection processes prior to major model deployments. Members of the network will be called upon based
on their expertise to help red team at various stages of the model and product development lifecycle. Not
every member will be involved with each new model or product, and time contributions will be
determined with each individual member”
Testing by fine-tuned LLMs and human domain exports
Real World Examples of Domain-specific LLMs
https://www.upstage.ai/feed/insight/examples-of-domain-speci
fi
c-llms
“Training a Domain-Specific LLM involves a combination of pre-training and fine-tuning on a targeted dataset to
perform well-defined tasks in specific domain. This approach differs from traditional language model training,
which typically involves pre-training on a large and diverse dataset to perform various tasks and language patterns.
Domain-specific models are trained on large amounts of text data that are specific to a particular domain to
perform a deep understanding of the linguistic nuances within it. This boosts LLMs to communicate effectively
with specialized vocabulary and provide high-quality answers. For this reason, maintaining industry terminologies
and staying updated on industry issues are quite important for leveraging it. Examples include:
1. Law
2. Math
3. Healthcare
4. Finance
5. Commerce”

Highest Risk: Applications that Change Environments
Testing by simulation or sandboxes
Singapore Generative AI Evaluation Sandbox
https://www.imda.gov.sg/resources/press-releases-factsheets-and-speeches/press-releases/2023/generative-ai-evaluation-sandbox
“1.The Sandbox will bring global ecosystem players together through concrete use cases, to enable the
evaluation of trusted AI products. The Sandbox will make use of a new Evaluation Catalogue, as a shared
resource, that sets out common baseline methods and recommendations for Large Language Models
(LLM).
2.This is part of the effort to have a common standard approach to assess Generative AI.
3. The Sandbox will provide a baseline by offering a research-based categorization of current evaluation
benchmarks and methods. The Catalogue provides an anchor by (a) compiling the existing commonly
used technical testing tools and organizing these tests according to what they test for and their methods;
and (b) recommending a baseline set of evaluation tests for use in Generative AI products.
“The Sandbox will offer a common language for evaluation of Generative AI through the Catalogue.
The Sandbox will build up a body of knowledge on how Generative AI products should be tested.
Sandbox will develop new benchmarks and tests”
Large Action Models(LAMs)
“Recent months have seen the emergence of a powerful new trend in which large language models are augmented
to become “agents”—software entities capable of performing tasks on their own, ultimately in the service of a
goal, rather than simply responding to queries from human users. I’ve come to call these agents Large-Action
Models, or LAMs, and I believe they represent as big a shift in the development as AI as anything we’ve seen in
the previous decade. Just as LLMs made it possible to automate the generation of text, and, in their multi-modal
forms, a wide range of media, LAMs may soon make it possible to automate entire processes.”
https://blog.salesforceairesearch.com/large-action-models/

Taxonomic System for Analysis of AI Incidents
From https://arxiv.org/pdf/2211.07280.pdf
“While certain industrial sectors (e.g., aviation) have a long history of mandatory incident reporting complete with
analytical findings, the practice of artificial intelligence (AI) safety benefits from no such mandate and thus analyses
must be performed on publicly known “open source” AI incidents. Although the exact causes of AI incidents are
seldom known by outsiders, this work demonstrates how to apply expert knowledge on the population of incidents in
the AI Incident Database (AIID) to infer the potential and likely technical causative factors that contribute to reported
failures and harms. We present early work on a taxonomic system that covers a cascade of interrelated incident
factors, from system goals (nearly always known) to methods / technologies (knowable in many cases) and technical
failure causes (subject to expert analysis) of the implicated systems. We pair this ontology structure with a
comprehensive classification workflow that leverages expert knowledge and community feedback, resulting in
taxonomic annotations grounded by incident data and human expertise.”
Incident Analysis Workflow

Adversarial Machine Learning Incident Analysis
A Taxonomy and Terminology of Attacks and Mitigations
https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-2e2023.pdf
This NIST Trustworthy and Responsible AI report develops a taxonomy of concepts and defnes
terminology in the feld of adversarial machine learning (AML). The taxonomy is built on surveying the
AML literature and is arranged in a conceptual hierarchy that includes key types of ML methods and
lifecycle stages of attack, attacker goals and objectives, and attacker capabilities and knowledge of the
learning process. The report also provides corresponding methods for mitigating and managing the
consequences of attacks and points out relevant open challenges to take into account in the lifecycle of
AI systems. The terminology used in the report is consistent with the literature on AML and is
complemented by a glossary that defnes key terms associated with the security of AI systems and is
intended to assist non-expert readers. Taken together, the taxonomy and terminology are meant to
inform other standards and future practice guides for assessing and managing the security of AI systems,
by establishing a common language and understanding of the rapidly developing AML landscape.
Taxonomy of attacks on Generative AI systems

Incident Fixes
Fixing Hallucinations in LLMs
https://betterprogramming.pub/fixing-hallucinations-in-llms-9ff0fd438e33?gi=a8912d3929dd
“Hallucinations in Large Language Models stem from data compression and inconsistency. Quality assurance is
challenging as many datasets might be outdated or unreliable. To mitigate hallucinations:
1. Adjust the temperature parameter to limit model creativity.
2. Pay attention to prompt engineering. Ask the model to think step-by-step and provide facts and references to
sources in the response.
3. Incorporate external knowledge sources for improved answer verification.
A combination of these approaches can achieve the best results.
The Rapid Response Team
https://www.svpg.com/the-rapid-response-team/
“In these cases, a practice that I have seen make dramatic improvements along both dimensions is to create at
least one special dedicated team that we often call the “Rapid Response Team.” This is a dedicated team
comprised of a product manager (or at least a part of a product manager), and mainly developers and QA.
Usually these teams are not large (2-4 developers is common). This team has the following responsibilities:
• fix any critical issues that arise for products in the sustaining mode (i.e. products that don’t have their own
dedicated team because you’re not investing in them other than to keep it running).
• implement minor enhancements and special requests that are high-value yet would significantly disrupt the
dedicated team that would normally cover these items.
• fix any critical, time-sensitive issues that would normally be covered by the dedicated team, but again
would cause a major disruption.”

Role of an Incident Tracking Database

Incident Tracking
Singapore AI Verify Foundation
https://aiverifyfoundation.sg/downloads/Proposed_MGF_Gen_AI_2024.pdf
“Incident Reporting – Even with the most robust development processes and safeguards, no software we
use today is completely foolproof. The same applies to AI. Incident reporting is an established practice, and
allows for timely notification and remediation. Establishing structures and processes to enable incident
monitoring and reporting is therefore key. This also supports continuous improvement of AI systems.”
Preventing Repeated Real World AI Failures by Cataloging Incidents
“Mature industrial sectors (e.g., aviation) collect their real world failures in incident databases to inform
safety improvements. Intelligent systems currently cause real world harms without a collective memory of
their failings. As a result, companies repeatedly make the same mistakes in the design, development, and
deployment of intelligent systems. A collection of intelligent system failures experienced in the real world
(i.e., incidents) is needed to ensure intelligent systems bene
fi
t people and society. The AI Incident Database
is an incident collection initiated by an industrial/non-p
ro
fi
t cooperative to enable AI incident avoidance and mitigation. The database supports a variety of
research and development use cases with faceted and full text search on more than 1,000 incident reports
archived to date

Incident Tracking Database
AI Incident Database
https://incidentdatabase.ai/
“The AI Incident Database is dedicated to indexing the collective history of harms or near harms realized in
the real world by the deployment of artificial intelligence systems. Like similar databases in aviation and
computer security, the AI Incident Database aims to learn from experience so we can prevent or mitigate bad
outcomes.
You are invited to submit incident reports, whereupon submissions will be indexed and made discoverable to
the world. Artificial intelligence will only be a benefit to people and society if we collectively record and
learn from its failings.”
Partnership on AI
https://partnershiponai.org/workstream/ai-incidents-database/
“As AI technology is integrated into an increasing number of safety-critical systems — entering domains
such as transportation, healthcare, and energy — the potential impact of this technology’s failures similarly
grows. The AI Incident Database (AIID) is a tool designed to help us better imagine and anticipate these
risks, collecting more than 1,200 reports of intelligent systems causing safety, fairness, or other real-world
problems.
As a central, systematized repository of problems experienced in the real world as a result of AI, this
crowdsourced database can help AI practitioners mitigate or avoid repeated bad outcomes in the future.
Discover previously contributed incident reports or submit your own today.”

LLM Interfaces to Databases
How LLMs made their way into the modern data stack
https://venturebeat.com/data-infrastructure/how-llms-made-their-way-into-the-modern-data-stack-in-2023/
“The first (and probably the most important) shift with LLMs came when vendors started debuting conversational querying
capabilities — i.e. getting answers from structured data (data fitting into rows and columns) by talking with it. This
eliminated the hassle of writing complex SQL (structured query language) queries and gave teams, including non-technical
users, an easy-to-use text-to-SQL experience, where they could put in natural language prompts and get insights from their
data. The LLM being used converted the text into SQL and then ran the query on the targeted dataset to generate answers.”
Text2SQL
https://medium.com/@changjiang.shi/text2sql-converting-natural-language-to-sql-defa12c2a69f
“Text2SQL is a natural language processing technique aimed at converting natural language expressions into structured
query language (SQL) for interaction and querying with databases. This article presents the historical development of
Text2SQL, the latest advancements in the era of large language models (LLMs), discusses the major challenges currently
faced, and introduces some outstanding products in this field.”
Can LLM Already Serve as A Database Interface
https://typeset.io/questions/can-llm-already-serve-as-a-database-interface-a-big-bench-3gje48fazi
“Large language models (LLMs) have shown impressive results in the task of converting natural language instructions into
executable SQL queries, known as Text-to-SQL parsing. However, existing benchmarks like Spider and WikiSQL focus on
small-scale databases, leaving a gap between academic study and real-world applications. To address this, the paper "Bird"
presents a big benchmark for large-scale databases in the context of text-to-SQL tasks. It contains a large dataset of text-to-
SQL pairs and 95 databases spanning various professional domains. The emphasis on database values in Bird highlights the
challenges of dirty database contents, external knowledge, and SQL efficiency in the context of massive databases. The
experimental results demonstrate the significance of database values in generating accurate text-to-SQL queries for big
databases. However, even the most effective text-to-SQL models, like ChatGPT, still have a long way to go in achieving
human-level accuracy. The paper also provides an efficiency analysis for generating text-to-efficient-SQL queries.

Noti
fi
cations
Using LLMs for notifications
https://pathway.com/developers/showcases/llm-alert-pathway
“Real-time alerting with Large Language Models (LLMs) like GPT-4 can be useful in many areas such as
progress tracking for projects (e.g. notify me when coworkers change requirements), regulations monitoring,
or customer support (notify when a resolution is present).
The program that we will create answers questions based on a set of documents. However, after an initial
response is provided, the program keeps on monitoring the document sources. It efficiently determines
which questions may be affected by a source document change, and alerts the user when a revision - or a
new document - significantly changes a previously given answer. The basic technique of feeding chunks of
information from external documents into an LLM and asking it to provide answers based on this
information is called RAG - Retrieval Augmented Generations. So, what we are doing here is real-time RAG
with alerting”
LLM Monitoring and Observability
https://www.onpage.com/large-language-models-llm-monitoring-and-observability/
Alerting is a crucial aspect of LLM monitoring, enabling prompt notification of potential issues and
facilitating timely corrective actions. Here are some pertinent questions related to alerting in the context of
LLM monitoring:
1. What types of alerts are relevant for LLM monitoring?
2. What are the considerations for setting alert thresholds in LLM monitoring?
3. How can alerts be effectively communicated to stakeholders?
4. What are the best practices for managing and responding to alerts in LLM monitoring?
5. How can alerts be used to proactively improve LLM performance and fairness?

Generative AI Delivery Process

UK Reference Model for Generative AI Delivery
From https://assets.publishing.service.gov.uk/media/65395abae6c968000daa9b25/frontier-ai-capabilities-risks-report.pdf
“The diagram below outlines the inputs to, and stages of, the development and deployment of frontier AI.”

Downstream Customized Models from OpenAI
“We launched the self-serve fine-tuning API for GPT-3.5 in August 2023. Since then, thousands of organizations
have trained hundreds of thousands of models using our API. Fine-tuning can help models deeply understand
content and augment a model’s existing knowledge and capabilities for a specific task. Our fine-tuning API also
supports a larger volume of examples than can fit in a single prompt to achieve higher quality results while reducing
cost and latency. Some of the common use cases of fine-tuning include training a model to generate better code in a
particular programming language, to summarize text in a specific format, or to craft personalized content based on
user behavior “
“Today (April 4, 2024) we are formally announcing our assisted fine-tuning offering as part of the Custom Model
program. Assisted fine-tuning is a collaborative effort with our technical teams “to leverage techniques beyond the
fine-tuning API, such as additional hyperparameters and various parameter efficient fine-tuning (PEFT) methods at
a larger scale. It’s particularly helpful for organizations that need support setting up efficient training data pipelines,
evaluation systems, and bespoke parameters and methods to maximize model performance for their use case or
task.”
“We believe that in the future, the vast majority of organizations will develop customized models that are
personalized to their industry, business, or use case. With a variety of techniques available to build a custom model,
organizations of all sizes can develop personalized models to realize more meaningful, specific impact from their
AI implementations. The key is to clearly scope the use case, design and implement evaluation systems, choose the
right techniques, and be prepared to iterate over time for the model to reach optimal performance.
With OpenAI, most organizations can see meaningful results quickly with the self-serve fine-tuning API. For any
organizations that need to more deeply fine-tune their models or imbue new, domain-specific knowledge into the
model, our Custom Model programs can help.”

Leading AI Safety Organizations

Leading AI Safety Organizations
US AI Safety Institute
US DHS AI Safety and Security Board
UK AI Safety Institute
OECD AIGO
Global Partnership on AI (GPAI)
Hiroshima Process
AI Verify Foundation
AI Alliance
AI Safety Summit
MLCommons
Frontier Model Forum
US and UK AI Safety Institutes Cooperation

US AI Safety Institute
“In support of e
ff
orts to create safe and trustworthy arti
fi
cial intelligence (AI), NIST is
establishing the U.S. Arti
fi
cial Intelligence Safety Institute (USAISI). To support this
Institute, NIST has created the U.S. AI Safety Institute Consortium. The Consortium
brings together more than 200 organizations to develop science-based and empirically
backed guidelines and standards for AI measurement and policy, laying the foundation
for AI safety across the world. This will help ready the U.S. to address the capabilities
of the next generation of AI models or systems, from frontier models to new
applications and approaches, with appropriate risk management strategies.”
“Building upon its long track record of working with the private and public sectors and its history
of reliable and practical measurement and standards-oriented solutions, NIST works with research
collaborators through the AISIC who can support this vital undertaking. Specifically, it will:
• Establish a knowledge and data sharing space for AI stakeholders
• Engage in collaborative and interdisciplinary research and development through the
performance of the Research Plan
• Prioritize research and evaluation requirements and approaches that may allow for a more
complete and effective understanding of AI’s impacts on society and the US economy
• Identify and recommend approaches to facilitate the cooperative development and transfer of
technology and data between and among Consortium Members
• Identify mechanisms to streamline input from federal agencies on topics within their direct
purviews
• Enable assessment and evaluation of test systems and prototypes to inform future AI
measurement efforts”
US AI Safety Institute Consortium
Addditional Funding Requested For NIST’s AI Safety Programs by Multiple AI Organizations

US DHS AI Safety and Security Board
“Today, the Department of Homeland Security announced the establishment of the Artificial Intelligence
Safety and Security Board (the Board). The Board will advise the Secretary, the critical infrastructure
community, other private sector stakeholders, and the broader public on the safe and secure development
and deployment of AI technology in our nation’s critical infrastructure. The Board will develop
recommendations to help critical infrastructure stakeholders, such as transportation service providers,
pipeline and power grid operators, and internet service providers, more responsibly leverage AI
technologies. It will also develop recommendations to prevent and prepare for AI-related disruptions to
critical services that impact national or economic security, public health, or safety. “
• Sam Altman, CEO, OpenAI;
• Dario Amodei, CEO and Co-Founder, Anthropic;
• Ed Bastian, CEO, Delta Air Lines;
• Rumman Chowdhury, Ph.D., CEO, Humane Intelligence;
• Alexandra Reeve Givens, President and CEO, Center for Democracy and Technology
• Bruce Harrell, Mayor of Seattle; Chair, Technology and Innovation Committee, US Conference of Mayors;
• Damon Hewitt, President and Executive Director, Lawyers’ Committee for Civil Rights Under Law;
• Vicki Hollub, President and CEO, Occidental Petroleum;
• Jensen Huang, President and CEO, NVIDIA;
• Arvind Krishna, Chairman and CEO, IBM;
• Fei-Fei Li, Ph.D., Co-Director, Stanford Human-centered Artificial Intelligence Institute;
• Wes Moore, Governor of Maryland;
• Satya Nadella, Chairman and CEO, Microsoft;
• Shantanu Narayen, Chair and CEO, Adobe;
• Sundar Pichai, CEO, Alphabet;
• Arati Prabhakar, Ph.D.Director, the White House Office of Science and Technology Policy;
• Chuck Robbins, Chair and CEO, Cisco; Chair, Business Roundtable;
• Adam Selipsky, CEO, Amazon Web Services;
• Dr. Lisa Su, Chair and CEO, Advanced Micro Devices (AMD);
• Nicol Turner Lee, Ph.D., Director of the Center for Technology Innovation, Brookings Institution;
• Kathy Warden, Chair, CEO and President, Northrop Grumman; and
• Maya Wiley, President and CEO, The Leadership Conference on Civil and Human Rights.
DHS Homeland Threat Assessment
The members include:

UK AI Safety Institute
‘The Institute will develop and run system evaluations, independently and in partnership with external
organizations, while also seeking to address a range of open research questions connected to evaluations.
Evaluations may not be able to fully understand the limits of capabilities or assure that safeguards are effective.
The goal of the Institute’s evaluations will not be to designate any particular AI system as ‘safe’, and the Institute
will not hold responsibility for any release decisions. Nevertheless, we expect progress in system evaluations to
enable better informed decision-making by governments and companies and act as an early warning system for
some of the most concerning risks. The Institute’s evaluation efforts will be supported by active research and clear
communication on the limitations of evaluations. The Institute will also convene expert communities to give input
and guidance in the development of system evaluations.”
“Evaluation Priorities
1. Dual-use capabilities: As AI systems become more capable, there could be an increased risk that malicious
actors could use these systems as tools to cause harm. Evaluations will gauge the capabilities most relevant to
enabling malicious actors,
2. Societal impacts: As AI is integrated into society, existing harms caused by current systems will likely increase,
requiring both pre and post-deployment evaluations. These evaluations will seek to investigate psychological
impacts, privacy harms, manipulation and persuasion, biased outputs and reasoning, impacts on democracy and
trust in institutions, and systemic discrimination.
3. System safety and security: Current safeguards are unable to prevent determined actors from misusing
today’s AI systems, for example by breaking safeguards or taking advantage of insecure model weights. Safety
and security evaluations will seek to understand the limitations of current safeguard methodologies and research
potential mitigations.
4. Loss of control: As advanced AI systems become increasingly capable, autonomous, and goal-directed, there
may be a risk that human overseers are no longer capable of effectively constraining the system’s behaviour.
Such capabilities may emerge unexpectedly and pose problems should safeguards fail to constrain system
behaviour. Evaluations will seek to avoid such accidents by characterising relevant abilities.”
Evaluations

OECD Arti
fi
cial Intelligence Governance (AIGO)
OECD Working Party on Arti
fi
cial Intelligence Governance (AIGO) Activities
• supports the implementation of OECD standards relating to AI;
• serves as a forum for exchanging experience and documenting approaches for advancing trustworthy AI that benefits
people and planet;
• develops tools, methods and guidance to advance the responsible stewardship of trustworthy AI, including the
OECD.AI Policy Observatory and Globalpolicy.AI platforms;
• supports the collaboration between governments and other stakeholders on assessing and managing AI risks;
• conducts outreach to non-OECD Member countries to support the implementation of OECD standards relating to AI.
AIGO Expert Groups
AI, Data, and Privacy
OECD AI Index
AI Risk & Accountability
AI Futures
AI Incidents
Compute and Climate

OECD-hosted Global Partnership on AI (GPAI)
“Launched in June 2020, GPAI ("gee-pay") is a multistakeholder initiative bringing together
leading experts from science, industry, civil society, international organizations and
government that share values to bridge the gap between theory and practice on AI by
supporting cutting-edge research and applied activities on AI-related priorities.
We aim to provide a mechanism for sharing multidisciplinary research and identifying key
issues among AI practitioners, with the objective of facilitating international collaboration,
reducing duplication, acting as a global reference point for speci
fi
c AI issues, and ultimately
promoting trust in and the adoption of trustworthy AI. We aim to provide a mechanism for
sharing multidisciplinary research and identifying key issues among AI practitioners, with
the objective of facilitating international collaboration, reducing duplication, acting as a
global reference point for speci
fi
c AI issues, and ultimately promoting trust in and the
adoption of trustworthy AI.
Through the collaboration within our working groups, GPAI assesses – on a comprehensive,
objective, open, and transparent basis – the scientific, technical, and socio-economic
information relevant to understanding AI impacts, encouraging its responsible development
and options for adaptation and mitigation of potential challenges.
In its first few years, GPAI experts will collaborate across four working groups on the themes
of responsible AI (including a subgroup on AI and pandemic response), data
governance, the future of work, and innovation and commercialization."
GPAI Mission Statement

G7 Hiroshima Process
Advance the development of and, where appropriate, adoption of international technical standards. This
includes contributing to the development and, where appropriate, use of international technical standards and
best practices"
Principles Document
“Specifically, we call on organizations to abide by the following principles, commensurate to the risks:
1. Take appropriate measures throughout the development of advanced AI systems, including prior to and
throughout their deployment and placement on the market, to identify, evaluate, and mitigate risks across the AI
lifecycle.
2 Identify and mitigate vulnerabilities, and, where appropriate, incidents and patterns of misuse, after deployment
including placement on the market.
3. Publicly report advanced AI systems’ capabilities, limitations and domains of appropriate and inappropriate use,
to support ensuring sufficient transparency, thereby contributing to increase accountability.
4. Work towards responsible information sharing and reporting of incidents among organizations developing
advanced AI systems including with industry, governments, civil society, and academia.
5. Develop, implement and disclose AI governance and risk management policies, grounded in a risk-based
approach – including privacy policies, and mitigation measures, in particular for organizations developing
advanced AI systems.
6. Invest in and implement robust security controls, including physical security, cybersecurity and insider threat
safeguards across the AI lifecycle.
7. Develop and deploy reliable content authentication and provenance mechanisms, where technically feasible,
such as watermarking or other techniques to enable users to identify AI-generated content
8. Prioritize research to mitigate societal, safety and security risks and prioritize investment in effective mitigation
measures.
9. Prioritize the development of advanced AI systems to address the world’s greatest challenges, notably but not
limited to the climate crisis, global health and education
10. Advance the development of and, where appropriate, adoption of international technical standards
11. Implement appropriate data input measures and protections for personal data and intellectual property”

AI Safety Summit
Bletchley Declaration
“Arti
fi
cial Intelligence (AI) presents enormous global opportunities: it has the potential to
transform and enhance human wellbeing, peace and prosperity. Alongside these
opportunities, AI also poses signi
fi
cant risks, including in those domains of daily life. To that end,
we welcome relevant international efforts to examine and address the potential impact
of AI systems in existing fora and other relevant initiatives”
“Many risks arising from AI are inherently international in nature, and so are best
addressed through international cooperation. We resolve to work together in an inclusive
manner to ensure human-centric, trustworthy and responsible AI”
“In the context of our cooperation, and to inform action at the national and international
levels, our agenda for addressing frontier AI risk will focus on:
• identifying AI safety risks of shared concern, building a shared scientific and evidence-
based understanding of these risks, and sustaining that understanding as capabilities
continue to increase, in the context of a wider global approach to understanding the
impact of AI in our societies.
• building respective risk-based policies across our countries to ensure safety in light of
such risks, collaborating as appropriate while recognising our approaches may differ
based on national circumstances and applicable legal frameworks. This includes,
alongside increased transparency by private actors developing frontier AI capabilities,
appropriate evaluation metrics, tools for safety testing, and developing relevant public
sector capability and scientific research.”

AIVerify Foundation from Singapore
“A global open-source community that convenes AI owners, solution providers, users, and
policymakers, to build trustworthy AI. The aim of AIVF is to harness the collective power and
contributions of an international open-source community to develop AI testing tools to enable
development and deployment of trustworthy AI. AI Verify is an AI governance testing framework
and software toolkit that helps industries be more transparent about their AI to build trust”
From https://aiverifyfoundation.sg/downloads/Proposed_MGF_Gen_AI_2024.pdf
“Safety and Alignment Research & Development (R&D) – The state-of-the-science today for
model safety does not fully cover all risks. Accelerated investment in R&D is required to improve
model alignment with human intention and values. Global cooperation among AI safety R&D
institutes will be critical to optimize limited resources for maximum impact, and keep pace with
commercially driven growth in model capabilities.”
“Testing and Assurance – For a trusted ecosystem, third-party testing and assurance plays a
complementary role. We do this today in many domains, such as
fi
nance and healthcare, to enable
independent veri
fi
cation. Although AI testing is an emerging
fi
eld, it is valuable for companies to
adopt third-party testing and assurance to demonstrate trust with their end-users. It is also important
to develop common standards around AI testing to ensure quality and consistency.
“Incident Reporting – Even with the most robust development processes and safeguards, no
software we use today is completely foolproof. The same applies to AI. Incident reporting is an
established practice, and allows for timely noti
fi
cation and remediation. Establishing structures and
processes to enable incident monitoring and reporting is therefore key. This also supports continuous
improvement of AI systems.”

AI Alliance
The AI Alliance, an international community of developers, researchers, and organizations dedicated to promoting
open, safe and responsible artificial intelligence, today announced the addition of more than 20 new members,
bringing together a diverse mix of academia, startups, enterprises, and scientific organizations from around the globe.
The Alliance has also established its
fi
rst two member-driven working groups, in AI Safety and Trust Tooling and AI
Policy Advocacy to take immediate action in support of the organization’s mission. These groups convene
researchers, developers, policy and industry experts to work together to comprehensively and openly address the
challenges of generative AI and democratize its bene
fi
ts.
The AI Alliance AI Safety and Trust Tooling working group will:
• Provide objective information and best practice guidance on AI safety, trust, ethics, and cybersecurity through a
showcase of tools, blogs, newsletters, and whitepapers.
• Improve the state of the art for models, datasets, and other tools that perform evaluation for sensitive data
detection, model quality and alignment, and cybersecurity threats and remediation.
• Establish a de
fi
nitive set of benchmarking capabilities for testing AI models and applications.
The AI Alliance AI Policy Advocacy working group will:
• Create a public forum through events and online discourse that brings the technical community and policymakers
together to address opportunities and barriers for open innovation in AI.
• Publish and disseminate information and opinion from AI Alliance members on key policy topics, including red
teaming, regulation on applications, and access to hardware.
• Represent the voices of the broader AI ecosystem reliant on open source and open innovation before
policymakers.
AI Alliance Working Groups

Center for AI Safety (CAIS)
CAIS exists to ensure the safe development and deployment of AI
AI Risk has emerged as a global priority, ranking alongside pandemics and nuclear
war. Despite its importance, AI safety remains remarkably neglected, outpaced by
the rapid rate of AI development. Currently, AI society is ill-prepared to manage the
risks from AI. CAIS exists to equip policymakers, business leaders, and the broader
world with the understanding and tools necessary to manage AI risk.
CAIS activities

MLCommons AI Safety
From https://mlcommons.org/?s=ai++safety
Creating a benchmark suite for safer AI
The MLCommons AI Safety working group is composed of a global consortium of industry leaders, practitioners,
researchers, and civil society experts committed to building a harmonized approach to AI safety. The working group
is creating a platform, tools, and tests for developing a standard AI Safety benchmark suite for different use cases to
help guide responsible AI development.
The AI Safety Ecosystem Needs Standard Benchmarks
IEEE Spectrum contributed blog excerpt, authored by the MLCommons AI Safety working group
Announcing MLCommons AI Safety v0.5 Proof of Concept
Achieving a major milestone towards standard benchmarks for evaluating AI Safety
AI Safety
AI Safety working group is composed of a global consortium of industry leaders, practitioners,
researchers,and civil society experts committed to building a harmonized approach to AI safety. The
working group is creating a platform, tools, and tests for developing a standard AI
AI Safety
AI Safety Working Group Meeting Schedule Join Related Blogs and News AI Safety Working Group
Projects MLCommons AI Safety MLCommons AI Safety Benchmarks
MLCommons Announces the Formation of AI Safety Working Group
The initial focus will be on the development of safety benchmarks for large language models used for
generative AI — using Stanford's groundbreaking HELM framework.
MLPerf Results Highlight Growing Importance of Generative AI and Storage
Latest benchmarks include LLM in inference and the
fi
rst results for storage benchmark

The Role of AI Safety Institutes in Enabling Trustworthy AI

Recommended

Recommended

More Related Content

Similar to The Role of AI Safety Institutes in Enabling Trustworthy AI

Similar to The Role of AI Safety Institutes in Enabling Trustworthy AI (20)

Recently uploaded

Recently uploaded (20)

The Role of AI Safety Institutes in Enabling Trustworthy AI