I have spent the last decade teaching math, from kindergarten classrooms in Hong Kong to doctoral seminars, training Mathematics Olympiad coaches across Asia, and writing fourteen books that are now read in more than sixty countries. None of that prepared me for the part of my work that has consumed the most engineering hours: building Mathewmatician's Dictionary, an automated mathematics learning system that decides, in real time, whether a student has actually understood the topic in front of them.Spend long enough on that problem and you stop thinking of it as an education problem. You start thinking of it as a systems design problem. And once you cross that line, you notice something uncomfortable: most of the bad math classrooms I have walked into have the same architectural flaws as most of the bad software I have used.This is what building a self paced math system taught me about software design.The Bug Is Almost Never Where You Think It IsIn a traditional math classroom, when a student fails a chapter on quadratic equations, the obvious move is to spend more time on quadratics. Tutor them. Reassign the worksheet. Drill the problem set.Almost every time, that diagnosis is wrong.The student is not failing at quadratics. They are failing at something three chapters back that the curriculum quietly assumed they had mastered. Linear equations. Manipulating exponents. Working with negative numbers under a square root. The visible failure is downstream of an invisible gap.If you have ever spent a week debugging a UI rendering issue only to discover the actual bug was in your data layer, you know this feeling. The symptom and the cause live in different parts of the system. Treating the symptom does nothing.When I designed Mathewmatician's Dictionary, the first thing I had to throw out was the chapter. The unit of progression in most math curricula is the chapter, the same way the unit of progression in a lot of legacy software is the screen. Both are organizational conveniences, not learning units. The actual unit is the concept, and concepts have dependencies that look exactly like a build graph. You cannot compile chapter 7 until chapters 2, 4, and 5 have linked cleanly. If they have not, every error message after that is misleading.Mastery Before Movement Is Just Test-Driven Development for HumansHere is the principle I built the entire system around: a student does not advance to the next topic until they have demonstrated mastery of the current one.Most teachers will tell you this is unrealistic. The school year is fixed. The curriculum has to be covered. If you wait for mastery, you will never finish.The same argument gets made in software all the time. We do not have time to write tests. We do not have time to refactor. We have to ship.And the same thing happens in both cases. You ship something that compiles but does not work. You move a student forward who passes the test but cannot actually do math. The cost shows up later, with interest.Mastery before movement is test driven development for learners. Every concept is a function. Before you can call it from a higher level concept, it has to pass its own tests cleanly. Not "the student got 70 percent on the chapter test." Cleanly. Every problem, completed independently, with the student demonstrating they can apply the concept without scaffolding.This sounds slow. In the short term, it is. In the long term, it is the only thing that is fast, because you stop accumulating the technical debt of half understood prerequisites.Adaptive Pacing Is Just a Job QueueMost "personalized learning" products treat adaptive pacing as a marketing claim. The student moves a little faster or a little slower based on a quiz score, and that is called adaptive.That is not adaptive. That is a difficulty slider.Real adaptive pacing is a job queue. Each student has a backlog of concepts they need to master. The system watches their performance, identifies which prerequisite gap is blocking the most downstream work, and surfaces that as the next task. When that task is complete, the system reprioritizes. A student who is strong in geometry but weak in algebra gets a different queue than a student who is the reverse, even if they are the same age in the same grade.The interesting part, as anyone who has built a real task scheduler knows, is that the hard problem is not the scheduling algorithm. The hard problem is accurate signal. If your input is noisy, scheduling is useless.In a math classroom, the noisiest input on earth is the school report card. Internal school tests are written by the same teachers who teach the course, which creates exactly the conflict of interest you would expect. Pass rates drift upward. A grade tells you almost nothing about whether the student can actually do the math.This is why I built a separate assessment layer, the Global Mathematics and Mathematics Olympiad Graded Assessment Test with Competition. It exists for one reason: to produce a clean signal so the adaptive system has something real to schedule against. In software terms, it is the observability layer. You cannot optimize what you cannot measure honestly.Where AI Actually Helps, and Where It Does NotEvery EdTech pitch deck in 2026 has the same slide about AI. Personalized tutors. Infinite practice problems. Conversational explanations.Some of that is real. Some of it is not.What AI is genuinely good at in a math learning system is generating practice problems at a precise difficulty level, explaining a single concept in three different ways for students who did not click with the first explanation, and flagging patterns in errors that a human teacher would not have the bandwidth to notice across thousands of students. These are bounded, well-specified tasks. AI does them at scale, cheaply, and well.What AI is bad at, today, is judgment about whether a student is genuinely ready to advance. The pattern recognition is impressive enough to look like understanding, and impressive enough to fool the student into thinking they understand. Both are dangerous.The mistake I see most often in EdTech right now is the same mistake I see in a lot of LLM tooling: treating the model's confidence as a signal of correctness. It is not. A student can produce a correct answer for the wrong reason. A model can produce a confident explanation that is subtly wrong. In both cases, you need a separate verification layer that does not share the same failure mode.So in my system, AI handles explanation and problem generation. A different, deterministic layer handles the mastery decision. They are deliberately not the same component, for the same reason you do not let the service that writes the data also decide whether the data is valid.What I Would Tell a Developer Building an EdTech ProductIf you are a developer thinking about building anything in education, three things to take seriously.First, the unit of progress is not the chapter, the module, or the screen. It is the concept, and concepts have dependencies. Map them. Treat them like a build graph. Refuse to let students compile their way past unresolved imports.Second, your signal is everything. If your assessment layer is dishonest, your adaptive layer is decorative. Build the measurement system first and the personalization system second, not the other way around.Third, separate the components that generate from the components that verify. The same instinct that makes you not let a class be both the writer and the auditor in your codebase applies here. Generative models are excellent generators. They are terrible auditors of their own output.The future of math education, in my view, is not more teachers. It is fewer teachers, doing higher leverage work, because the basics have been offloaded to systems that genuinely understand mastery rather than performing it.When that happens, the people we will need most in math classrooms are the ones who can do what no system can: notice the student who is technically passing every check, and is quietly checked out. That is the part of teaching no AI is taking anytime soon. The rest of it, frankly, is overdue for a rewrite.\\