At Conjecture, we believe that transformative artificial intelligence may be deployed within the next decade. While timelines are up for debate, there is consensus in the AI Safety community that there is currently no known solution that ensures that transformative AI will be safe.
Conjecture’s mission is to solve this problem, which is called the “AI Alignment Problem.”
Our current approach is to develop “Cognitive Emulation,” an AI architecture that bounds systems' capabilities and makes them reason in ways that humans can understand. This paper aims to provide some background context on what Cognitive Emulation is and why we believe it is a path towards building safe, human-level AI.
Beyond this paper, Connor’s FLI Podcast contains many intuitions for how Cognitive Emulation Systems work and what it means to “emulate human reasoning”. For those curious we can also provide a transcript and further notes.
What is Cognitive Emulation?
Cognitive Emulation is a research agenda about constraining what AI systems can do to a regime that can be reasoned through by humans.
For now, its main constraints are:
- Avoiding autonomous agents or swarms. Instead, we aim to build a meta-system with a human-in-the-loop that, for any human-level task, the system can reliably build a submodule that solves it.
- Avoiding big black boxes. Instead, we aim to factor human reasoning, or “System 2 thinking” explicitly, and build the most specific modules or models that can solve a given task. The aim of this is to build explainability and robustness into the architecture from the ground-up.
- Avoiding unsupervised optimisation over end-to-end policies. While SGD is a great way to learn about the world, using it to create an end-to-end policy through SGD means that we create systems that act like agents, whose goals and behaviors are inherently driven by a process we do not understand.
With those constraints, we believe we can still build useful general systems, comprising humans, AIs, and regular software. These systems will not be as powerful as RL-GPT5 recursively teaching itself. And this is the point: we believe that there is no safe way to build and scale autonomous, black-box systems.
We need an alternative paradigm.
What does it mean to “emulate human reasoning”?
When we describe CoEms (shorthand for “Cognitive Emulation Systems”) as “emulating human reasoning,” we are not referring to simulating neuroscience and biochemical processes. Instead, we are talking about explicit logical processes that can be reasoned through step-by-step.
With a tip of the hat to Daniel Kahneman’s “Thinking Fast and Slow,” we break human cognition into two parts:
- System 1 Thinking: the intuitions, snap-judgements, and priors that shape much of human behavior and ideas.
- System 2 Thinking: explicit, deliberate reasoning that we use when we think “how would I go about solving this problem.” This tends to be slower and more verbal than Systems 1 thinking. (Note that Daniel Kahneman nowadays calls this split “Type 1 Processing” and “Type 2 Processing”, where Type 2 Processing is essentially many Type 1 Processing steps strung together, plus memory.)
In addition to these types of reasoning, humans rely on many cognitive processes which outsource cognition to something external to the human brain. From a birds-eye view, this externalized cognition is most of what has made humanity a successful intelligence. We use tools like calculators and scratchpads; we develop psychotechnologies like language; we invent processes like the scientific method; we group minds together in companies and other social units to increase collective cognition or power; we build systems like markets to solve collective resource allocation issues, and more.
We believe that modern ML systems have started to replicate some of these cognitive processes more than others:
- System 1 Thinking is covered by neural networks. NNs are great at taking fuzzy abstractions and heuristics and turning them into language, the building block of Systems 2 thinking. This idea is supported by the fact that we observe multimodality everywhere in NNs.
- System 2 Thinking is the “human reasoning” piece that is heavily lacking in modern AI systems, and is where the CoEm agenda tries to make the biggest gains. Making this explicit and forcing problems to be solved via reasoning we can understand is critical for CoEms to work.
- Externalized cognition can be built from auto-GPT-like systems, tool use, cascade models, etc.
Cognitive Emulation systems leverage all three of these reasoning layers. We are building LLMs in-house, which cover System 1 thinking. Most of the R&D and engineering that is explicitly “CoEm” work will focus on System 2 and Externalized Cognition.
What does Cognitive Emulation look like in practice?
CoEm systems are an alternative to the status quo of training larger-and-larger black-box language models. Instead, CoEms are systems, not a single model. Rather than rely on one LLM, CoEms integrate software architecture, distributed systems, and traditional computer science thinking into AI design. While LLMs may be part of the subcomponents of this system, many parts of the system will not be supported by LLMs.
A sketch for the build path of AI systems that fit the design constraints of the CoEm agenda:
- Elementary CoEms are simple computation graphs, with each node being a call to a small domain specific AI model, in the spirit of Cascade.
- Basic CoEms integrate more complex constructs, such as loops, scratchpads, human-in-the-loop or python nodes, in the spirit of LangChain.
- Complex CoEms involve meta constructs, such as picking what to learn, what to finetune a model on, structured search, problem solving algorithms, and heuristics for how to deal with out-of-distribution situations.
- Complete CoEms might simply unfold what a human would have done, in the way it was programmed, while poking at humans whenever it faces an ambiguous or unexpected situation.
Conjecture has already developed a number of critical components of this build path. We have trained LLMs, created an alternative to LangChain with superior graph notation, taught models to use tools and ask humans for clarifications in ambiguous situations, connected LLMs/TTS/STT in a multimodal AI systems, developed infrastructure to finetune models on minimal datasets, and built smaller task-specific models that outperform larger models in narrow domains. All of this we have done in-house, which is a necessary prerequisite for ensuring the overall system we develop is free of safety and security loopholes.
The future build path includes engineering challenges such as longer context windows, retrieval, reflection, training code models, stronger multimodality, enhanced tool use, and more. The build path also includes novel R&D challenges, such as ensuring robustness when combining distinct CoEm modules into more complex AI systems, finding generalized reasoning strategies that combine smaller LLMs to match the performance of larger ones, and finding normative ways to decompose tasks.
Due to our Infohazard Policy we do not share publicly all of the details about the technical specifications of Cognitive Emulation Systems, but we are happy to discuss more in private conversation with any audience who would like additional information on the underlying architecture.
What is “safe” about Cognitive Emulation?
The above design sketches and intuitions are imprecise, but provide high-level intuitions for why we expect CoEms to be safer than the current paradigm. For example, by building in strong System 2 thinking, we aim to force a design constraint that makes systems…
- Explainable by creating "causal reasoning processes" that we can audit and understand;
- Bounded by limiting the power of any one instance of black-box LLMs in the system to solving a particular step of System 2 thinking;
- Reliable by necessitating that capability gains are built on top of predictable and robust submodules that can handle LLM’s inherent unpredictability.
Done well, CoEm systems shift us from a misalignment paradigm to a misuse paradigm. This does not mean that CoEm systems are benign. CoEm systems are as unsafe as human-level AGI if used by a malign operator, should the operator choose to teach the CoEm system all general tasks and deploy it end-to-end as an agent in the world.
Explainability is about building systems whose behavior can be reasoned through by humans. Any task a CoEm system performs will have a “causal story” such that a human could understand why it made the decisions it did, where the blueprint for its reasoning came from, and why we should trust that the blueprint does what it says it does. Note that this causal story is not produced in a post-hoc step, but is built into the architecture itself (i.e., a human could understand the causal story of an inert CoEm by inspecting it).
We consider it a major risk that we cannot do this for modern ML systems. Safety first and foremost must come from understanding our systems.
With all systems besides modern ML systems, we have guarantees of causal stories. For example, imagine that you want to find out why your headphones produce sound. Most people just trust and expect their headphones to work, but if you wanted to, you could find the blueprints, the company that made them, and the engineer who designed them. You could check the calculations, reveal the blueprints behind the model, and educate yourself on the more complicated bits of the design. At the end of the day, you could reverse engineer your headphones, and replace your trust of the system with a causal explanation of exactly how it came together.
NNs do not naturally have causal stories that make sense to humans, so we will need to build these into the CoEms by factoring the steps of how our systems solve problems. This is the foundation on which we can start to assume boundedness of models.
When theorists prove that a computational program is “correct,” they work within the bounds of an assumed model of the program and an assumed system running that model. They can then derive that under those assumed conditions the program will follow a formal specification.
Boundedness is the equivalent for a CoEm, and can be assessed given three things: a model of a CoEm (the causal story); a model of the execution environment (assumptions about what the system can and can't do, such as “without access to infinite compute, this hash cannot be decoded”); and a specification (e.g., the CoEm system will always do X). From these, boundedness then relies on an argument that the system will follow the specification when run with the assumptions. As with any modeling, the degree of certainty in boundedness relies on the accuracy of the assumptions used to formalize the bound.
Some examples of types of bounding that may be possible:
- Limiting model inputs (prompts) or outputs (type signature, number of characters, etc.)
- Limiting the number of inference steps of a model
- Limiting the size of a model
- Using a tool to guarantee that computation is performed a certain way
- Factoring the cognition of a specific reasoning step into smaller steps
- Using selection inference to causally rely on previous outputs for a new LLM call
Boundedness is not a binary property. Rather, it is a property of a semi-formal specification such that you can be comfortable that a system will act within some specified parameters with some level of confidence. This is still meaningful - these bounds are the kind of guarantees that allow humans to launch rockets into space.
The Cognitive Emulation build path is designed to be safer than the dominant ML paradigm because it is built on modular capabilities.
In the dominant ML paradigm, larger models are developed that advance capabilities in unpredictable ways. For example, GPT-4 represented a significant advance over GPT-3 such that autonomous agents became viable, while similar systems built with GPT-3 could not function well. Inevitably, these systems are then deployed with significant “capabilities overhangs,” that are revealed in time as developers interact with them in ways that haven’t been tested before deployment.
The alternative presented by Cognitive Emulation is that we have a much stronger ability to evaluate new capabilities as we assemble these systems piece by piece. Causal stories and boundedness mean that we can make assumptions about each of the models and modules within the system. As we advance from elementary → basic → complex → complete CoEms, the capabilities advances of the system will be known and incrementally added to through new modules that perform specific, robust, actions.