This work benefited from feedback and comments from the whole Conjecture team, as well as others including Steve Byrnes, Paul Christiano, Leo Gao, Evan Hubinger, Daniel Kokotajlo, Vanessa Kosoy, John Wentworth, Eliezer Yudkowsky. Many others also kindly shared their feedback and thoughts on it formally or informally, and we are thankful for everyone's help.
This document is for Conjecture and includes employees, interns, and collaborators. Note that this policy is not retroactive; any past discussions on this subject have been informal.
This policy applies only to direct infohazards related to AGI Capabilities. To be completely clear: this is about infohazards, not PR hazards, reputational hazards, etc.; and this is about AGI capabilities.
Examples of presumptive infohazards:
- Leaking code that trains networks faster
- Leaking a new technique that trains networks faster
- Leaking a new specific theory that leads to techniques that trains networks faster
- Letting it be known outside of Conjecture that we have used/built/deployed a technique that already exists in the literature to train networks faster
- Letting it be known outside of Conjecture that we are interested in using/building/deploying a technique that already exists in the literature in order to train networks faster
1-3 are obvious. 4-5 are dangerous because they attract more attention to ideas that increase average negative externality. If in the future we want to hide more types of information that are not covered by the current policy, we should explicitly extend the scope of what is hidden.
Siloing of information and projects is important even within Conjecture. Generally any individual team member working on secret projects may disclose to others that they are working on secret projects, but nothing more.
The default mantra is “need to know”. Does this person need to know X? If not, don’t say anything. Ideally, no one that does not need to know should know how many secret projects exist, which projects people work on, and what any of those projects are about.
While one should not proactively offer that they are keeping a secret, we should strive for meta-honesty. This means that when asked directly we should be transparent that we are observing an infohazard policy that hides things, and explain why we are doing so.
There are three levels of disclosure that we will apply.
- Secret: Information only shareable with specific individuals.
- Private: Information only shareable with a fuzzy group.
- Public: Information shareable with everyone.
We will consider these levels of disclosure for following types of information:
- Repositories: Entire repositories are easier to box than individual files and folders.
- Projects: Specific projects or sub-projects are easier to box than issues within a project.
- Ideas: Ideas are very hard to keep track of and make secret, so when possible, we will try to box them in repositories or projects. But there may be ideas that are exceptions.
Each project that is secret or private must have an access document associated with it that lists who knows about the secret and any whitelisted information. This document is a minor infosecurity hazard, but is important for coordination.
An appointed infohazard coordinator has access to all secrets and private projects. For Conjecture, this person is Connor, and the succession chain goes CEO → CSO → CTO → Research Lead(s). When collaborating with other organizations on a secret or private project, each organization’s appointed coordinator has access to the project. This clause ensures there is a dedicated person to discuss infohazards with, help set standards, and resolve ambiguity when questions arise. A second benefit of the coordinator is strategy: whoever is driving Conjecture should have a map of what we are working on and what we are intentionally not working on.
Leaking infohazardous information is a major breach of trust not just at Conjecture but in the alignment community as a whole. Intentional violation of the policy will result in immediate dismissal from the company. This applies to senior leadership as well. Mistakes are different from intentional leaking of infohazards.
More details on the levels of disclosure are below, and additional detail on consequences and the process for discerning if leaked information was shared intentionally or not is discussed in “Processes”.
- Who: For an item to be a secret, it needs to have a precise list of people who know it. This list can include people outside of Conjecture. This list of people must be written down explicitly in the access document, and must include the appointed infohazard coordinator.
- What is covered: All information related to secret projects must be treated as secret unless it is whitelisted. It may only be discussed with members of the secret project. The policy prohibits leaking secret information to anyone not on the project list. When in doubt as to whether information is “related,” talk to the project lead or appointed infohazard coordinator.
- Whitelisting: For any given secret, there may be some information that is okay to share with more people than just the secret group. All whitelisted information must be explicitly written in the access document and agreed on by those who know the secret. Without this written and acknowledged agreement, sharing any information related to the secret will be considered a breach of policy. Whitelisted information is considered private and must follow the rules of private information below. The appointed infohazard coordinator must abide by the same whitelisting rules as all other members of a secret group.
- Who: For a thing to be private, it needs to have a group or groups that are privy to it. In practice, this could mean Conjecture, or groups from the Alignment community. The document for each private project must explicitly state which groups know the information. Information related to private projects may only be shared with members of these groups. See “Sharing Information” in processes below for more details.
- Try to define fuzzy boundaries: Private things can be much less specific than secret things, and the “groups” shared with are sometimes undefined. This is a necessary evil because we do not have the bandwidth to keep formal track of all interactions and idea-sharing; these broader categories are necessary to facilitate collaboration.
- Limit publicity: No private information may be published to forums, or in a paper. Please refrain from posting about private information on Discord, Slack or other messaging platforms used in the future by Conjecture, unless in relatively small and protected environments where the “group” or audience is clear.
- Use sparingly: Avoid relying on private as a category; it is clearer and preferable to make information either public or secret.
- Who: Literally everyone. This is information you could tell your mother, or all your friends, or post on Twitter. That does not mean that you should.
- Discretion: It is still important to apply discretion to sharing public information. Posts that we write on the AlignmentForum are public, but they are not press releases. Information we discuss at EAG is public, but it is not something we would email everyone about.
- Forums are public: LessWrong, AlignmentForum, EA Forum, EleutherAI Discord (public channels), and similar sites are considered public outlets. Any information shared there must be deemed public.
- No take-back-sies: Once public information has been shared, we cannot take it back. We may revise our discretion level and move information to secret, but we should assume anything shared is permanently, irreversibly, out there.
1. Assigning Disclosure Levels
For new projects: Whenever a new project is spun up, the appointed infohazard coordinator and the project lead work will work together to assess if the content of the project is infohazardous and if it should be assigned as secret, private, or public. Each conversation will include:
(1) what information the project covers
(2) in what forms the information about the project already exists, e.g., written, repo, AF post, etc.
(3) who knows about the project, and who should know about the project
(4) proposed disclosure level
If the project is determined to be secret or private, an access document must be created that lists who knows about the project and any whitelisted information. Any information about the project that currently exists in written form must be moved to and saved in a repository or project folder with permissions limited to those on the access document list.
Anyone can ask the appointed infohazard coordinator to start a project as a secret. The default is to accept. At Conjecture, the burden of proof is on Connor if he wants to refuse, and he must raise an objection that proves that the matter is complicated enough to not accept immediately, and might change in the future. In general, any new technical or conceptual project that seems like it could conceivably lead to capabilities progress should be created as secret by default.
For current projects (changing disclosure levels): Anyone can propose changing the disclosure level of a project.
- Secret → Private: To move a project from secret to private, all members of the project and the appointed infohazard coordinator must agree.
- Private → Public: Before making public any information, all members of the project must agree. Also, members must consult external trusted sources and get a strong majority of approval.
When collaborating with another organization, there should be one or more individuals that both parties agree is trusted to adjudicate on the matter.
Public → Private: Avoid this. Redacting previously public information is difficult, and in the rare circumstance that this should be done it is presumably because the information is infohazardous enough that it should be made secret.
Public or Private → Secret: This should only be considered in situations where infohazardous information is determined to be particularly sensitive. Furthermore, this should be done with care, in order to avoid attracting more attention from the Streisand effect.
Here, the burden of proof is on the individual proposing the change, and they should discuss the matter directly with the project leader or the appointed infohazard coordinator. If the coordinator (and in most cases the project lead) agree, follow the process in “for new projects” above.
Additionally, if the project was private and if this is feasible, check in with everyone that currently has access to the information to inform them that the disclosure level is changing to secret, and have them read the infohazard policy. Each person must be added to the list of people who know about the project. If these individuals will no longer be working on the project, they should still be noted as knowing about the project, but in a separate list.
2. Sharing Information
Each project must have an access document associated with it that lists who knows about the information and what information is whitelisted to discuss more freely. This list will be kept in a folder or git repository that only members of the secret or private project have access to.
Secret information can only be shared with the individuals who are written on the access list. Anyone in a secret project may propose adding someone new to the secret. First discuss adding the individual with the project leader, and then inform all current members and give them a chance to object. If someone within the team objects, the issue is escalated to the appointed infohazard coordinator, who has the final word. If the team is in unanimous agreement, the coordinator gets a final veto (it is understood that the coordinator is supposed to only use this veto if they have private information as to why adding this person would be a bad idea).
Private information can only be shared with members of groups who are written on the access list. Before sharing private information with person X, first check if the private piece of information has already been shared to someone from the same group as X. Then, discuss general infohazard considerations with X and acknowledge which select groups have access to this information. Then, notify others at Conjecture that you have shared the information with X. In case of doubt, ask first.
Public information can be talked about with anyone freely, though please be reasonable.
For all secret and private projects, by default information sharing should happen verbally and should be kept out of writing (in messages or documents) when possible.
3. Policy Violation Process
We ask present and future employees and interns to sign nondisclosure agreements that reiterate this infohazard policy. Intentional violation of the policy will result in immediate dismissal from the company. The verdict of whether the sharing was intentional or not will be determined by the appointed infohazard coordinator but be transparent to all members privy to the secret ((i.e., at Conjecture, Connor may unilaterally decide, but has his reputation and trust at stake in the process).
C-suite members of Conjecture are not above this policy. This is imperative because so much of this policy relies on the trust of senior leaders. As mentioned above, the chain of succession on who knows infohazards goes Connor → Gabe → Sid → Adam; though actual succession planning is outside the scope of this document. If it is Connor who is in question for intentionally leaking an infohazard, Gabe will adjudicate the process with transparency available to members of the group privy to the secret. Because of the severity of this kind of decision, we may opt to bring in external review to the process and lean on the list of “Trusted Sources” above.
Mistakes are different from intentional sharing of infohazards. We will have particular lenience during the first few months that this policy is active as we explore how it is to live with. We want to ensure that we create as robust a policy as possible, and encourage employees to share mistakes as quickly as possible such that we can revise this policy to be more watertight. Therefore, unless sharing of infohazardous information that is particularly egregious, nobody will be fired for raising a concern in good faith.
4. Information Security and Storage
[Details of Conjecture’s infosecurity processes are - for infosecurity reasons - excluded here.]
5. Quarterly Policy Review
We will review this policy as part of our quarterly review cycle. The policy will be discussed by all of Conjecture in a team meeting, and employees will be given the opportunity to talk about what has gone well and what has not gone well. In particular, the emphasis will be on clarifying places where the policy is not clear or introduces contradictions, and adding additional rules that promote safety.
The quarterly review will also be an opportunity for Project Leaders to review access documents to ensure lists of individuals and whitelisted information for each project are up-to-date and useful.
This policy will always be available for employees at Conjecture to view and make suggestions on, and the quarterly review cycle will be an opportunity to review all of these comments and make changes as needed.