‘GiveWell for AI Safety’: Lessons learned in a week

Wait 5 sec.

Published on May 30, 2025 6:38 PM GMTOn prioritizing orgs by theory of change, identifying effective giving opportunities, and how Manifund can help.Epistemic status: I spent ~20h thinking about this. If I were to spend 100+ h thinking about this, I expect I’d write quite different things. I was surprised to find early GiveWell ‘learned in public’: perhaps this is worth trying.The premise: EA was founded on cost-effectiveness analysis—why not try this for AI safety, aside from all the obvious reasons¹? A good thing about early GiveWell was its transparency. Some wish OpenPhil were more transparent today. That seems sometimes hard, due to strategic or personnel constraints. Can Manifund play GiveWell’s role for AI safety—publishing rigorous, evidence-backed evaluations?With that in mind, I set out to evaluate the cost-effectiveness of marginal donations to AI safety orgs².  Since I was evaluating effective giving opportunities, I only looked at nonprofits³.I couldn’t evaluate all 50+ orgs in one go. An initial thought was to pick a category like ‘technical’ or ‘governance’ and narrow down from there. This didn’t feel like the most natural division. What’s going on here?I found it more meaningful to distinguish between ‘guarding’ and ‘robustness’⁴ work.Org typeGuardingRobustnessWhat they’re trying to doDevelop checks to ensure AI models can be developed and/or deployed only when it is safe to do so. Includes both technical evals/audits and policy (e.g. pause, standards) advocacyDevelop the alignment techniques and safety infrastructure (e.g. control, formal verification) that will help models pass such checks, or operate safely even in the absence of such checks Some reasons you might boost ‘guarding’:You think it can reliably get AI developers to handle ‘robustness’, and you think they can absorb this responsibility wellYou think ‘robustness’ work is intractable, slow, or unlikely to be effective outside large AI companiesYou prioritize introducing disinterested third-party auditsYou think ‘guarding’ buys time for ‘robustness’ workSome reasons you might boost ‘robustness’:You want more groups working on ‘robustness’ than solely AI developersYou think ‘guarding’ work is unlikely to succeed, or be effective against advanced models, or is fragile to sociopolitical shiftsYou prioritize accelerating alignment / differential technological progressYou think ‘robustness’ work makes ‘guarding’ efforts more effectiveFinally, a note on ‘robustness’. I don’t expect safety protocols that work on current models to generalize to more capable models without justification and concerted effort. Accordingly, I think it makes sense to separately classify orgs whose theory of change (ToC) focuses on superintelligent systems.⁵Doing so—and categorizing other helpful work as ‘facilitating’—we get a typology roughly as follows:As you can see, ‘technical’ and ‘governance’ orgs fall all over the map. Some orgs are particularly hard to categorize—e.g. CHAI has outputs that plausibly fall in all four quadrants—so I’ve tried to focus on orgs with narrower remits. Redwood, GovAI, and RAND are placed solely for the agendas stated. Some work (like METR’s) facilitates work east or north of it. In this diagram, which quadrant / quadrant boundary an org belongs to carries meaning, but where it lies within that does not. The horizontal axis may collapse depending on your beliefs.I think this typology is very much open to contention. Its main point is to convey how I arrived at exploring the funding landscape for ‘robustness’ orgs. I’m interested in the effective giving prescription regarding these, particularly those tackling superintelligence risks, because (in roughly decreasing order):I think their work is less legible to outsiders than ‘guarding’ or ‘facilitating’ work.I think evaluation of their work is comparatively neglected, given other funders’ current foci.I endorse differential technological development, with safety outpacing capabilities.I’m concerned about the fragility of a guarding-only strategy, particularly in fast takeoff scenarios.I think having concrete plans for a ‘pause’ may strengthen the case for one.When I searched for nonprofits working on superintelligence (ASI) safety ‘robustness’, I got a list as follows:OrgCan donate online?Publicly seeking donations?On Manifund?