AI company Anthropic has announced a collaboration with the Department of Energy (DOE) and the National Nuclear Security Administration (NNSA) to prevent its AI model, Claude, from assisting in the creation of nuclear weapons. The initiative aims to proactively address potential national security risks posed by advanced AI systems.
Developing a Digital Safeguard
The partnership involved deploying a version of Claude into a Top Secret cloud environment provided by Amazon Web Services (AWS). Within this secure setting, the NNSA conducted extensive “red-teaming”—systematically testing the AI for weaknesses and vulnerabilities related to nuclear information. This process led to the joint development of a “nuclear classifier.” This sophisticated filter is designed to identify when a conversation with the AI is approaching sensitive topics based on a list of risk indicators provided by the NNSA. Anthropic states the classifier is precise enough to flag harmful queries without hindering legitimate discussions about nuclear energy or medicine.
Proactive Safety or Security Theater?
While the effort is framed as a prudent safety measure, it has drawn mixed reactions from experts. Some, like Oliver Stephenson of the Federation of American Scientists, see it as a worthwhile precaution for future, more capable AI models, but call for more transparency from AI companies about their risk assessments. Others are more critical. Heidy Khlaaf, chief AI scientist at the AI Now Institute, labeled the effort “security theater,” arguing that if Claude wasn’t trained on classified nuclear data to begin with, the classifier is moot. Khlaaf also raised concerns about the known inaccuracies of large language models and the risks of giving private corporations access to sensitive government data. In response, Anthropic maintains this is a proactive step to build safety systems for future risks and hopes its classifier becomes a voluntary industry standard.