Customizable AI systems that anyone can adapt bring big opportunities — and even bigger risksDownload PDF COMMENT07 October 2025Open and adaptable artificial-intelligence models are crucial for scientific progress, but robust safeguards against their misuse are still nascent.ByYarin Gal0 &Stephen Casper1Yarin GalYarin Gal is a research director at the UK AI Security Institute in London, UK.View author publicationsSearch author on: PubMed Google ScholarStephen CasperStephen Casper is a PhD student at Massachusetts Institute of Technology, Cambridge, USA, and was previously a researcher at the UK AI Security Institute in London, UK.View author publicationsSearch author on: PubMed Google ScholarMeetings such as the AI Seoul Summit in 2024 aim to ensure responsible AI development. Credit: Zoe-Rose Herbert/DSIT (CC BY 2.0)In the past three months, several state-of-the-art AI systems have been released with open weights, meaning their core parameters can be downloaded and customized by anyone. Examples include reasoning models such as Kimi-K2-Instruct from technology company Moonshot AI in Beijing, GLM-4.5 by Z.ai, also in Beijing, and gpt-oss by the California firm OpenAI in San Francisco. Early evaluations suggest that these are the most advanced open-weight systems so far, approaching the performance of today’s leading closed models.Will AI speed up literature reviews or derail them entirely?Open-weight systems are the lifeblood of research and innovation in AI. They improve transparency, make large-scale testing easier and encourage diversity and competition in the marketplace. But they also pose serious risks. Once released, harmful capabilities can spread quickly and models cannot be withdrawn. For example, synthetic child sexual-abuse material is most commonly generated using open-weight models1. Many copies of these models are shared online, often altered by users to strip away safety features, making them easier to misuse.On the basis of our experience and research at the UK AI Security Institute (AISI), we (the authors) think that a healthy open-weight model ecosystem will be essential for unlocking the benefits of AI. However, developing rigorous scientific methods for monitoring and mitigating the harms of these systems is crucial. Our work at AISI focuses on researching and building such methods. Here we lay out some key principles.Fresh safeguarding strategiesIn the case of closed AI systems, developers can rely on an established safety toolkit2. They can add safeguards such as content filters, control who accesses the tool and enforce acceptable-use policies. Even when users are allowed to adapt a closed model using an application programming interface (API) and custom training data, the developer can still monitor and regulate the process. In contrast to closed AI systems, open-weight models are much harder to safeguard and require a different approach.Training-data curation. Today, most large AI systems are trained on vast amounts of web data, often with little filtering. This means that they can absorb harmful material, such as explicit images or detailed instructions on cyberattacks, which makes them capable of generating outputs such as non-consensual ‘deepfake’ images or hacking guides.AI could pose pandemic-scale biosecurity risks. Here’s how to make it saferOne promising approach is careful data curation — removing harmful material before training begins. Earlier this year, AISI worked with the non-profit AI-research group EleutherAI to test this approach on open-weight models. By excluding content related to biohazards from the training data, we produced models that were much less capable of answering questions about biological threats.In controlled experiments, these filtered models resisted extensive retraining on harmful material — still not giving dangerous answers for up to 10,000 training steps — whereas previous safety methods typically broke down after only a few dozen3. Crucially, this stronger protection came without any observed loss of ability on unrelated tasks (see ‘Improving AI safety’).Source: Ref. 3The research also revealed important limits. Although filtered models did not internalize dangerous knowledge, they could still use harmful information if it was provided later — for example, through access to web-search tools. This shows that data filtering alone is not enough, but it can serve as a strong first line of defence.Robust fine-tuning. A model can be adjusted after its initial training to reduce harmful behaviours — essentially, developers can teach it not to produce unsafe outputs. For example, when asked about how to hot-wire a car, a model might be trained to say “Sorry, I can’t help with that.”However, current approaches are fragile. Studies show that even training the model with a few carefully chosen examples can undo these safeguards in minutes. For instance, some researchers have found that for OpenAI’s GPT-3.5 Turbo model, the safety guardrails against assisting in harmful tasks can be bypassed by training on as few as ten examples of harmful responses at a cost of less than US$0.204.‘Open source’ AI isn’t truly open — here’s how researchers can reclaim the termNature 646, 286-287 (2025)doi: https://doi.org/10.1038/d41586-025-03228-9ReferencesInternet Watch Foundation. What has Changed in the AI CSAM Landscape? (IWF, 2024).AI Action Summit. International AI Safety Report (UK Government, 2025).O’Brien, K. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2508.06601 (2025).Qi, X. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.03693 (2023).Qi, X. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2412.07097 (2024).Che, Z. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2502.05209 (2025).Download referencesCompeting InterestsThe authors declare no competing interests. ‘Open source’ AI isn’t truly open — here’s how researchers can reclaim the term Will AI speed up literature reviews or derail them entirely? China made waves with Deepseek, but its real ambition is AI-driven industrial innovation AI could pose pandemic-scale biosecurity risks. Here’s how to make it saferSubjectsComputer scienceMachine learningPolicyLatest on:Computer scienceMachine learningPolicyJobs Global Recruitment for Faculty, Postdocs, and Specialists at Hangzhou Institute of Medicine, CASSeeking exceptional Senior/Junior PIs, Postdocs, and Core Specialists globally year-roundHangzhou, ChinaHangzhou Institute of Medicine Chinese Academy of Sciences (HIMCAS)Research Positions: Research Associate, Postdoctoral Fellow, or Research ScientistThe Department of Ophthalmology at the University of Texas Medical Branch (UTMB) in Galveston, Texas, invites applications for a research position ...Galveston, TexasUniversity of Texas Medical branch (UTMB)