#1 AI models, power, politics, and performance

Wait 5 sec.

I simultaneously feel like I’m talking to an extremely brilliant PhD student who’s been a systems programmer their entire life, and a 10-year-old.Andrej Karpathy, one of the world’s leading AI researchers, explaining the ‘jaggedness’ of AI performance, March 2026 My information consumption is now 1/4 X, 1/4 podcast interviews of the smartest practitioners, 1/4 talking to the leading AI models, and 1/4 reading old books. The opportunity cost of anything else is far too high, and rising daily… Honestly, I know I should only be doing one of these at this point, but I can’t quite bring myself to shed the other three.Marc Andreessen, creator of first web browser, co-founder of a16z VC companyAll AI amounts to is plausible nonsense.Former senior GCHQ official, Ciaran Martin, 2025Images: Bismarck walking on his estate pondering politics; Peter Steinberger creating Clawdbot/OpenClaw, the fastest-growing Github repo in history (NB. for the EU optimists/nationalists among you, note how as soon as his triumph went public he moved to the Bay Area and slammed EU regulations on his way)IntroductionI put Volume I of the Chronology of Bismarck, from 1815 to August 1867, on here in 2023.I’m skipping finishing Vol. II for now and will do Vol. III next — 1871-1879. Vol II is more useful today if you’re looking at the origins of war but Vol III will be more useful for those considering how to build a new political regime. After peace with France in 1871, Bismarck had to make many crucial decisions which shaped the Reich’s institutions across the economy, law, finance, military, state bureaucracy, as well as international relations. And then around 1877-79 he changed direction in many crucial ways in foreign and domestic policy, and switched from supporting the liberals on free trade to protection. As I do this I’m running experiments with AI models and because of other projects I’m spending more time generally thinking about the models.How might the rapidly improving AI tools help us (1) explore and learn from history and (2) improve performance in politics?I started doing this Chronology 15-20 years ago after seeing so many discrepancies between all the major books. I wanted to track the different claims about what happened when so I could figure out a more accurate picture. As I went along, it turned into a fascinating story of an extraordinary political development which shaped world history: if these events 1862-6 had worked out differently, maybe no Britain v Germany world wars. Having a chronological record helped understand the twists and turns and to consider classic general questions about history and politics. (I’ve done the same for 1914 for the same reason.)I thought it would be useful to explore the models in an area I’ve been interested in for decades, spent years reading about, and have a lot of background knowledge and context. This allows me to get a sense of the models’ capabilities for research tasks in history and politics such as collecting facts, assessing evidence, analysing decisions, and exploring fundamental questions like agency vs system, ideas vs material forces, causation etc. If I get a sense of what they’re like in an area where I have a lot of context and don’t have to exhaustively check everything, I can get a sense of how useful they’d be in things where I don’t have much context.I also explored a fascinating question — if you try to make the models think hard about Bismarck’s diplomacy and extract lessons from it, how do they apply what they learn to current problems like western policy in Ukraine, strategy over Iran/Hormuz or how Xi might think about an invasion of Taiwan? How interesting is it? How does it compare with bog standard people in politics and the best people? How useful could they be now to someone going into crisis meetings, for example in providing a ‘red team’ steel-man of counter-arguments? I am not an expert in using AI nor an expert prompter. That’s part of the experiment — I want to help show what you can do with these models if you are a) very curious about a political subject and want to research it, but b) *not* an AI expert. Although some of what I do below is affected by discussions with experts/lab insiders, I have not involved experts in it. After it’s published, I’ll ask some experts what they think and build this into the followup. I have used the Bismarck prompts with the models to write documents for a couple of people working at the frontier of AI and their responses have been ‘that’s a very good strategy!’ I will try more such experiments.If you’re an MP, an official, a researcher, a journalist — and you are not an expert in use of these models — I hope you will find this useful and interesting.A weird fact about the world is that political research is amazingly underrated as a force which can change history. Put another billion or ten into a normal company, little really changes in terms of world history. But just thousands wisely deployed on political research can change history. I’ve explained this at length (e.g here) and won’t rehash. Politics does not focus on the most high value tokens. People repeatedly communicate without figuring out if what they’re doing is counterproductive. They fail to do the most basic research on opponents. People fight entire election campaigns without understanding what dominates the thinking of crucial voters. People with money rarely understand politics well and don’t realise politics does not focus on the most high value tokens. So vast amounts of money is wasted on ‘campaigns’ and ‘think tanks’ while the search for the most high value tokens is unfunded. The models will affect politics partly because they will radically reduce the cost of finding high value tokens, so people with little cash won’t have to find 500k plus to do a project. The potential leverage of political teams with a very small number of able relentless people will grow enormously. This isn’t speculation, I can see it on projects I’m working on / helping with. A conclusion from my experiments: you’re better off having the paid versions of Opus or GPT work for you than ~99% of MPs. Question to ponder as reading: how many MPs’ working 12 hours per day for you would be roughly as valuable in political research as paid-Opus or GPT working for you, with token limits set to the cost of the MP salaries pro rata? Would 5 MPs working full time be better than Opus with a token budget of 5 MP salaries (i.e ~£500k p/a or ~£40k per month)? What if you could hire just five extremely able people and given them the MP salaries for token budgets: could all the MPs combined, without model help, compete?Some questions about using models to explore history and politicsModel performance is, per Karpathy above and discussed below, very ‘jagged’ — their performance and usefulness is simultaneously very high in some areas and awful in others, and the patterns are not the same as with humans. How useful/reliable are models for detailed historical/political work? What are they best/worst at? What are their strengths and weaknesses? What level of human performance do they approximate in fact checking?What level of human performance do they approximate in weighing evidence?What level of human performance do they approximate in reasoning about causation and counterfactuals? Can they apply ideas from the likes of Judea Pearl about causation and counterfactuals? What level of human performance do they approximate in reasoning about historical explanations? Can they find and curate evidence and use it to analyse competing explanations? For example: The causes of war.Diplomatic contests.Political campaigns success/failure. The relative weight in different case studies of agents (e.g Bismarck, Franz Joseph) and systems (e.g the Great Power competition), about ideas (e.g nationalism), material forces (e.g automation) and institutions (e.g an intelligence agency). What level of human performance do they approximate in discovering hidden but valuable signals? E.g if you ask a model to consider hundreds of pages of evidence, can they extract the sort of signals that an intelligence agency would find valuable to know, something hidden but valuable? Can they pick out evidence relevant to a problem like ‘to what extent did false intelligence contribute to mobilisation decisions which in turn contributed to a war starting’? Can they detect conspiracies?What level of human performance do they approximate in analysing war and operations like war, such as insurgencies and coups? Can they analyse ends, ways, and means? Can they analyse strategy? Can they assess the risks of war? Can they assess capabilities? To what extent are errors in such analysis because of a) irreducible complexity/nonlinearity which makes errors inevitable for highly competent entities versus b) repeated standard bureaucratic errors among normally competent entities? What’s the frontier of analytical performance? E.g In 1866 European experts almost all expected Austria to win. In 1870 they expected France to win. Why were most wrong in 1866, why did they make similar errors in 1870? What lessons does this have for us? (In 2022, most western experts including the CIA and MI6 thought ‘Russia will quickly win’ then flipped to ‘Russia is collapsing’. Both were wrong. This is a perennial problem.)What level of human performance do they approximate in sensing changes in compressed foundational ideas over time? If you look at any year you will see a collection of critical players who believe certain things with little or no reflection — they are the accepted idea of their time. E.g In 1850, it was thought among British elites that obviously we should maintain the two-power standard for the Royal Navy. In 1950, this idea was dead. In 1750, Europe elites saw written constitutions as an idea of the devil. In 1850, European elites felt that written constitutions could hardly be resisted. In the mid-19th century, European elites had come to see free trade as mutually beneficial and the future. By 1900, protection had spread and free trade ideas were under attack everywhere. Can we a) identify deep critical ideas among ruling elites which are influential because hardly questioned, b) trace how and why they change, c) identify transitions between slow changes in the background to sudden changes in crises, d) see how such a shift affects critical decisions such that history goes down one path instead of another? This is very hard and controversial. It’s not a science. In practical politics it’s barely discussed other than in periods of revolution and chaos. (This paragraph is turned into a prompt below.)What level of human performance do they approximate in applying historical analogies to contemporary problems? E.g Many discuss the US vs PRC competition in comparison with Britain vs Germany and Athens vs Sparta. How good are the models at doing this? What sort of prompts work well/badly? People discovered ad hoc improvements in prompting such as, with maths, telling models ‘think step by step’. Recently it was reported that just repeating instructions to models — equivalent to telling your child ‘stop doing X now, stop doing X’ — works. What do experiments suggest about possibilities for new tools? E.g Can models extract information which states think is secret?If you put the Chronology into the model’s context window, with all the twists and turns of Schleswig-Holstein and the Austrian diplomacy 1863-6 — i.e a lot details about the most detailed extremely high performance diplomatic case study in history — and ask it to apply what it’s learned to problems like, say, the West’s approach to Ukraine or how Xi might plan an invasion of Taiwan — do they generate interesting ideas? Can models extract principles of use to humans from detailed information?Why do humans find it so hard to learn from historical examples of brilliant (rare) and relatively terrible (common) performance in diplomacy, war, government planning and execution, political campaigns and so on?Can models spot emerging crises? Can models identify some highly non-linear and consequential decisions amid the vast majority of irrelevant actions in ways that we can use to make predictions? Can we apply these lessons to other crises?How can models be used to improve training?Can models make useful predictions about future news? (Yes.)Many experts say that as models improve and more tasks done by humans today become automated, two things will remain for humans longer than most — taste and long-term planning? To what extent do the models demonstrate or learn either now? Bismarck said to Busch in 1870 that the hard thing about the 1863-66 struggle was how so many different things connected over years — how many somewhat discrete (but subtly connected) delicate operations over different timescales, any of which might go kaput for reasons outside his control, connected to his priorities. Repeatedly when asked about things, he’d reply along the lines of ‘it depends how the cards fall’, but this was in the context of very clear — and much clearer than all his opponents’ — crucial priorities. A great art you see in very rare people brings together taste and long-term planning, but the word ‘planning’ is tricky because there are a few things that don’t/hardly change (e.g another independent state under Augustenburg is bad, we should try to grab Schleswig Holstein and use it to change the power balance in Germany) and many things that do change, including always shifting internal and external hostile coalitions. He left many deep comments about these fundamental uncertainties and ‘planning’ and ‘decisions’, and many rebukes to ‘professors’ and rejections of ‘political science’. Part of my interest is trying to see what happens if you make models focus on such comments and weight them over other tokens they learn. It was impossible during the animated and sometimes stormy development of our politics always to foresee with certainty whether the road which I took was the right one, and yet I was obliged to act as though I could predict with absolute clearness both coming events and the effect which my decisions would have upon them… It is just as impossible to foresee with any certainty the political results at the time when a measure has to be carried, as it would be in our climate to predict the weather of the next few days. Yet we have to make decisions as though we can do so, often enough fighting against all the influences to which we are accustomed to attach weight… The consideration of the question whether a decision is right, and whether it is right to hold fast and carry through what, though upon a weak premise, has been recognised as right, has an agitating effect on every conscientious and honourable man. This is strengthened by the circumstance that often many years must elapse before we are able in political matters to convince ourselves whether our wishes and actions were right or wrong…Politics is a thankless job because everything depends on chance and conjecture. One has to reckon with a series of probabilities and improbabilities and base one’s plans upon this reckoning… As long as he lives the statesman is always unprepared. In the attainment of that for which he strives he is too dependent on the participation of others, a fluctuating and incalculable factor… [One] has to expect random disturbances like the farmer does with weather conditions. Even after the greatest success he cannot say with certainty, ‘Now it is achieved; I am done with it,’ and look back at what has been accomplished with complacency… One can bring individual matters to a conclusion, but even then there is no way of knowing what the consequences will be… In politics there is no such thing as complete certainty and definitive results… Everything goes continually uphill, downhill…A real responsibility in high politics can only be undertaken by one single directing minister, never by a numerous board with majority voting. The decision as to paths and bypaths often depends on slight but decisive changes, sometimes even on the tone or choice of expressions in an international document. Even the slightest departure from the right line often causes the distance from it to increase so rapidly that the abandoned clue cannot be recovered and the return to the bifurcation, where it was left behind, becomes impossible. The customary official secrecy conceals for whole generations the circumstances under which the track was left, and the result of the uncertainty in which the operative connection of things remains, produces in leading ministers … an indifference to the material side of business as soon as the formal side has been settled by a royal signature or parliamentary votes.Experiments below, conducted with a variety of free and paid models#1 Fact checking my ~400 page Chronology.#2 Analysing Bismarck’s diplomacy 1862-6, looking at it from the perspective of Austrian intelligence.#3 Austria’s biggest errors, Pearl counterfactuals.#4 What happens if you ask the models to study the Chronology, extract lessons, then apply those lessons to analysing the West’s policy on Ukraine?#5 Investigating the widespread confusion over dates and decisions across secondary sources regarding Prussian/Austrian mobilisation spring-summer 1866. Can the models find primary sources, translate them, and resolve disputes between professional historians?#6 Analysing military experts’ predictions. An interesting aspect of 1866 is that the vast majority of military experts predicted Austria would win. Then in 1870 few updated and most predicted France would win. The improvements made in the Prussian General Staff, training, planning, decentralised infantry etc have been deeply studied post-1870 but at the time were largely overlooked or misinterpreted. Can models read the Chronology, pick out references to this issue, and report back? This is the sort of task that is done a million times a day in SW1. Can models do such a thing with a 400 page report at ‘roughly the normal performance level in politics’ but in much less time than a human? #7 Analysing the Austrian intelligence coup of 22/5/1866. Can models explore an obscure reference to a potentially history-changing intelligence coup, research it in primary sources, and do a useful report including counterfactuals? #8 How good are models at detecting evidence of and analysing conspiracies? Much of politics, and arguably the most important bits, intrinsically involves things which can hardly be said, outside very tightly held circles. The longest chapter in Machiavelli’s writing is Book 3, Chapter 6 of Discourses of Livy — a chapter on conspiracies. A highly useful application would be detecting signals of conspiracies amid vast noise.#9 Can models identify changes over time in basic ideas believed by elites? I asked the models to read the Chronology and extract deep foundational ideas barely questioned in the era of ~1815, how they shift over decades, slow then fast, and how this shift suddenly shows up in decisions. This is something I think more and more about regarding politics. In 2017-19 I and a few others built something we never published to see if we could measure the attention of the SW1 bubble and make predictions about the focus of this attention and how it would change over time. Advances often come from measuring things previously considered only qualitatively. We showed to our own satisfaction that Yes, we can. A few similar projects have published similar results. Westminster, and as far as I can see politics generally in the west, has ignored such experiments (because of the ‘demand problem’ below). But there’s a connected question about the long-term. I’ll return to this. #10 Bismarck’s career as analogy for humans maintaining control of AI. Bismarck optimised for maintaining wide future options and to avoid constraints and control, in a style analogous to how the best computer chess programs quantifiably choose moves which widen their options and close opponents’ options. Many supposed ‘safety features’ of politics were tried and failed, including ‘switch him off’. What do the models think of this analogy in the context of discussion about how to control models/agents as they surpass human performance in more and more ways?I give Verdicts as I go along then at the end I give some overall impressions.If you’re particularly interested in Bismarck/19th century history and AI research, you can read the whole thing but you can also skim and read the summary at the end.This blog does not use fancy agents but I’m experimenting with fancy agents and will report soon. There’s no hideous gimmick where at the end I reveal the AI wrote this, everything is clearly separated between me/AI.A few points on AI & politics/government, ten years after the referendum Read more