Research

Making sure future AI systems are interpretable, controllable, and produce good outcomes in the real world is a fundamental part of the alignment problem. Our R&D aims directly at gaining a better understanding of, and ability to control, current AI models.

Mosaic and Palimpsests: Two Shapes of Research

Mosaic and Palimpsests: Two Shapes of Research

By Adam Shimi Introduction In Old Masters and Young Geniuses, economist-turned-art-data-analyst David Galenson investigates a striking regularity in the careers of painters: art history and markets favors either their early pieces or the complete opposite — their last ones. From this pattern and additional data, Galenson extracts and defends a separation

Epistemological Vigilance for Alignment

Epistemological Vigilance for Alignment

By Adam Shimi Nothing hampers Science and Engineering like unchecked assumptions. As a concrete example of a field ridden with hidden premises, let's look at sociology. Sociologist must deal with the feedback of their object of study (people in social situations), their own social background, as well as the myriad

Productive Mistakes, Not Perfect Answers

Productive Mistakes, Not Perfect Answers

By Adam Shimi. I wouldn’t bet on any current alignment proposal. Yet I think that the field is making progress and abounds with interesting opportunities to do even more, giving us a shot. Isn’t there a contradiction? No, because research progress so rarely looks like having a clearly

Cognitive Emulation: A Naive AI Safety Proposal

Cognitive Emulation: A Naive AI Safety Proposal

This is part of the work done at Conjecture. This post has been reviewed before publication as per our infohazard policy. We thank our external reviewers for their comments and feedback. This post serves as a signpost for Conjecture’s new primary safety proposal and research direction, which we call

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

The following are the summary and transcript of a discussion between Paul Christiano (ARC) and Gabriel Alfour, hereafter GA (Conjecture), which took place on December 11, 2022 on Slack. It was held as part of a series of discussions between Conjecture and people from other organizations in the AGI and

AGI in sight: our look at the game board

AGI in sight: our look at the game board

From our point of view, we are now in the end-game for AGI, and we (humans) are losing. When we share this with other people, they reliably get surprised. That’s why we believe it is worth writing down our beliefs on this. 1. AGI is happening soon. Significant probability

Human decision processes are not well factored

Human decision processes are not well factored

A classic example of human bias is when our political values interfere with our ability to accept data or policies from people we perceive as opponents. When most people feel like new evidence threatens their values, their first instincts are often to deny or subject this evidence to more scrutiny

Empathy as a natural consequence of learnt reward models

Empathy as a natural consequence of learnt reward models

Epistemic Status: Pretty speculative but built on scientific literature. This post builds off my previous post on learnt reward models. Crossposted from my personal blog. Empathy, the ability to feel another's pain or to 'put yourself in their shoes' is often considered to be a fundamental human cognitive ability, and

Come work with us!

Check out our current open positions!