Research

Ensuring a good future with advanced AI systems requires such systems to be interpretable, controllable and aligned with human values. Our research agenda focuses on building Cognitive Emulation - an AI architecture that bounds systems' capabilities and makes them reason in ways that humans can understand and control.

As well as being progress towards solving the full Alignment Problem, CoEm systems have application in critical infrastructure and any use-case of AI where the end user needs their system to be explainable, bounded, and more reliable than traditional LLMs.

Ensuring that transformative AI systems are safe is called the alignment problem, and it is completely unsolved. Companies are racing to the bottom, competing to build absurdly expensive, uncontrollable LLMs, without control over them and the systems built upon them.

Cognitive Emulation

At Conjecture, we embrace the alignment problem. It cannot be solved as an afterthought. Explainability, controllability, and reliability need to be built from the ground up. Success in building Cognitive Emulation shifts us from an AI misalignment paradigm to one where humans are in control.

Alignment
Priorities for the UK Foundation Models Taskforce

Priorities for the UK Foundation Models Taskforce

The UK government recently established the Foundation Models Taskforce, focused on AI safety and backed by £100M in funding. Founder, investor and AI expert Ian Hogarth leads the new organization. The establishment of the Taskforce shows the UK’s intention to be a leading player in the greatest governance challenge

Cognitive Emulation: A Naive AI Safety Proposal

Cognitive Emulation: A Naive AI Safety Proposal

This is part of the work done at Conjecture. This post has been reviewed before publication as per our infohazard policy. We thank our external reviewers for their comments and feedback. This post serves as a signpost for Conjecture’s new primary safety proposal and research direction, which we call

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

The following are the summary and transcript of a discussion between Paul Christiano (ARC) and Gabriel Alfour, hereafter GA (Conjecture), which took place on December 11, 2022 on Slack. It was held as part of a series of discussions between Conjecture and people from other organizations in the AGI and

Basic facts about language models during training

Basic facts about language models during training

In this post, we continue the work done in our last post on language model internals but this time we analyze the same phenomena occurring during training. This is extremely important in understanding how language model training works at a macro-scale and sheds light into potentially new behaviours or specific

AGI in sight: our look at the game board

AGI in sight: our look at the game board

From our point of view, we are now in the end-game for AGI, and we (humans) are losing. When we share this with other people, they reliably get surprised. That’s why we believe it is worth writing down our beliefs on this. 1. AGI is happening soon. Significant probability

Human decision processes are not well factored

Human decision processes are not well factored

A classic example of human bias is when our political values interfere with our ability to accept data or policies from people we perceive as opponents. When most people feel like new evidence threatens their values, their first instincts are often to deny or subject this evidence to more scrutiny

Don't accelerate problems you're trying to solve

Don't accelerate problems you're trying to solve

If one believes that unaligned AGI is a significant problem (>10% chance of leading to catastrophe), speeding up public progress towards AGI is obviously bad. Though it is obviously bad, there may be circumstances which require it. However, accelerating AGI should require a much higher bar of evidence and much

Why almost every RL agent does learned optimization

Why almost every RL agent does learned optimization

Or "Why RL≈RL2 (And why that matters)" TL;DR: This post discusses the blurred conceptual boundary between RL and RL2 (also known as meta-RL). RL2 is an instance of learned optimization. Far from being a special case, I point out that the conditions under which RL2 emerges are actually

Come work with us!

Check out our current open positions!