Our Research agenda

Making sure future AI systems are interpretable, controllable, and produce good outcomes in the real world is a fundamental part of the alignment problem. Our R&D aims directly at gaining a better understanding of, and ability to control, current AI models.

We aim to conduct both conceptual and applied research that addresses the prosaic alignment problem. This will involve training state of the art models that we will use to study applied research problems like model interpretability, value alignment, and steerability.

On the conceptual side, we aim to build new frames for reasoning about large language models, and investigate meta-level strategies for making good research bets. While we aim to match the state of the art in NLP and surrounding areas, we are committed to avoiding dangerous AI race dynamics.

Scalable Intepretability
We believe interpretability research that elucidates the internal workings of black-box models can catalyse other safety research, and aim to produce high quality interpretability research whose insights can scale to models with many billions of parameters and larger.
Theory Of Language Models
We aim to communicate, explore, and exploit new frames for understanding GPT-like models as simulators of text-processes called simulacra, a theory which highlights novel alignment strategies and illustrates how LLMs will scale and influence AGI development.
Intepretability And Visualisation Tools
We want to build tools and frameworks to make interpretability with neural nets more accessible, and to help reframe conceptual problems in concrete terms.
History And Philosophy Of Alignment
We want to propose, argue and structure a pluralistic approach to alignment with even more varied proposals. This involves mapping the current approaches to alignment, translating between them, finding promising directions that have been dropped too soon, and proposing new ones.

Special Programs


Refine is a 3-month incubator for conceptual AI alignment research in London, hosted by Conjecture.

Alignment research aims to ensure that AI systems are aligned with human interests and values, a difficult problem for which no one currently has a general solution. We expect that AI systems are by default misaligned with human values and that deploying sufficiently powerful systems without a solution to the problem would be catastrophic. This is a fully-paid program for helping aspiring independent researchers find, formulate, and get funding for new research bets, ideas that are promising enough to try out for a few months to see if they have more potential. Refine was developed to assist relentlessly resourceful individuals with diverse research backgrounds who want the support and resources to drive their own ideas forward.

You can learn more about the program here. Applications for the current cohort are closed at the moment. You can still apply for the incubator (and we encourage you to do so in order to have a trace of your interest), but we aren’t accepting any new applications for the first cohort, and we will not give feedback on your application.