The UK government recently established the Foundation Models Taskforce, focused on AI safety and backed by £100M in funding. Founder, investor and AI expert Ian Hogarth leads the new organization. The establishment of the Taskforce shows the UK’s intention to be a leading player in the greatest governance challenge
Ensuring a good future with advanced AI systems requires such systems to be interpretable, controllable and aligned with human values. Our research agenda focuses on building Cognitive Emulation - an AI architecture that bounds systems' capabilities and makes them reason in ways that humans can understand and control.
As well as being progress towards solving the full Alignment Problem, CoEm systems have application in critical infrastructure and any use-case of AI where the end user needs their system to be explainable, bounded, and more reliable than traditional LLMs.
Ensuring that transformative AI systems are safe is called the alignment problem, and it is completely unsolved. Companies are racing to the bottom, competing to build absurdly expensive, uncontrollable LLMs, without control over them and the systems built upon them.Cognitive Emulation
At Conjecture, we embrace the alignment problem. It cannot be solved as an afterthought. Explainability, controllability, and reliability need to be built from the ground up. Success in building Cognitive Emulation shifts us from an AI misalignment paradigm to one where humans are in control.Alignment
This survey was conducted and analysed by Maris Sala. We put together a survey to study the opinions of timelines and probability of human extinction of the employees at Conjecture. The questions were based on previous public surveys and prediction markets, to ensure that the results are comparable with people’
This post was written as part of the work done at Conjecture. You can try input swap graphs in a collab notebook or explore the library to replicate the results. Thanks to Beren Millidge and Eric Winsor for useful discussions throughout this project. Thanks to Beren for feedback on a
This is part of the work done at Conjecture. This post has been reviewed before publication as per our infohazard policy. We thank our external reviewers for their comments and feedback. This post serves as a signpost for Conjecture’s new primary safety proposal and research direction, which we call
The following are the summary and transcript of a discussion between Paul Christiano (ARC) and Gabriel Alfour, hereafter GA (Conjecture), which took place on December 11, 2022 on Slack. It was held as part of a series of discussions between Conjecture and people from other organizations in the AGI and
In this post, we continue the work done in our last post on language model internals but this time we analyze the same phenomena occurring during training. This is extremely important in understanding how language model training works at a macro-scale and sheds light into potentially new behaviours or specific
From our point of view, we are now in the end-game for AGI, and we (humans) are losing. When we share this with other people, they reliably get surprised. That’s why we believe it is worth writing down our beliefs on this. 1. AGI is happening soon. Significant probability
A classic example of human bias is when our political values interfere with our ability to accept data or policies from people we perceive as opponents. When most people feel like new evidence threatens their values, their first instincts are often to deny or subject this evidence to more scrutiny
If one believes that unaligned AGI is a significant problem (>10% chance of leading to catastrophe), speeding up public progress towards AGI is obviously bad. Though it is obviously bad, there may be circumstances which require it. However, accelerating AGI should require a much higher bar of evidence and much
Or "Why RL≈RL2 (And why that matters)" TL;DR: This post discusses the blurred conceptual boundary between RL and RL2 (also known as meta-RL). RL2 is an instance of learned optimization. Far from being a special case, I point out that the conditions under which RL2 emerges are actually