We are a team of researchers dedicated to applied, scalable AI alignment research.
We believe we will see superhuman artificial intelligence within our lifetime. In light of AI’s recent progress, we also believe that this AI is likely to derive from modern machine learning architectures and techniques like gradient descent.
But today’s AI models are black boxes - optimized for mathematical objectives only tenuously related to what we actually care about as humans. Powerful language models such as GPT3 cannot currently be prevented from producing undesired outputs and complete fabrications to factual questions. Because we lack a fundamental understanding of the internal mechanisms of current models, we have few guarantees on what our models might do when encountering situations outside their training data, with potentially catastrophic results on a global scale.
Making sure future AI systems are interpretable, controllable, and produce good outcomes in the real world is a fundamental part of the alignment problem. Our R&D aims directly at gaining a better understanding of, and ability to control, current AI models.
We aim to conduct both conceptual and applied research that addresses the prosaic alignment problem. This will involve training state of the art models that we will use to study applied research problems like model interpretability, value alignment, and steerability. On the conceptual side, we aim to build new frames for reasoning about large language models, and investigate meta-level strategies for making good research bets. While we aim to match the state of the art in NLP and surrounding areas, we are committed to avoiding dangerous AI race dynamics.
We will be opening an incubator for new conceptual alignment researchers. Details to be announced in April 2022.