We’re Repeating Cybersecurity’s Big Mistake, This Time With AI

Wait 5 sec.

Fifteen years ago, most enterprises treated cybersecurity as an afterthought — a box to check rather than a foundational strategy. Companies rushed to deploy web and cloud services, leaving security teams scrambling to retrofit protection onto systems already in production. We all know the end of that story: massive breaches, billions in damages and a fundamental loss of trust that could have been prevented with proactive security design.Today, I see the same pattern unfolding with AI, but the stakes are far higher. Unlike a data breach, which is a discrete event, an AI failure can be silent and insidious, propagating through systems for months or even years.Organizations are rapidly deploying generative and agentic AI across finance, healthcare and critical infrastructure. Yet, our recent survey of over 4,400 developers and QA professionals worldwide revealed a stunning disconnect: While 72% are actively developing AI applications, only 33% are using adversarial testing techniques to identify vulnerabilities before deployment. This isn’t just a gap; it’s a chasm that is widening every day.Over the past two years, I’ve worked with leading enterprises deploying AI systems, from financial firms building customer chatbots to tech giants fortifying their models against attack. I’ve learned that the traditional testing methods we use for conventional software simply don’t work for AI.Why Traditional Testing Fails AIThe core challenge is that AI systems are not static; they are constantly evolving. While a traditional application will always give you the same output for the same input, an AI model can provide you subtly — or dramatically — different responses each time. This unpredictability makes conventional, automated testing much harder to catch the most critical failure modes.Consider a leading financial services firm that partnered with us to enhance its AI chatbot. While traditional testing confirmed it could handle basic inquiries, the situation was different when we deployed a diverse team of human testers. They engaged with the chatbot over thousands of scenarios and uncovered critical weaknesses that automated tests would never have found. For example, models can’t necessarily interpret idioms. If a user asks, “Is my account in the red?” the chatbot, failing to understand the idiom, might shift the conversation to account color settings rather than financial statusWhat we uncovered weren’t just bugs; they were emergent behaviors that only surfaced through real-world, human interaction. Experiences like this across dozens of enterprise deployments have taught us that effective AI testing requires a fundamentally different methodology.The 3 Pillars of AI Quality AssuranceBased on our experience testing large-scale AI deployments, I’ve identified three crucial methodologies that organizations must adopt to ensure robust AI quality assurance:Human-in-the-Loop (HITL) evaluation at scale: AI testing requires diverse human perspectives that reflect your actual user base. For one global technology company preparing to launch a consumer chatbot, we assembled thousands of testers from six countries. The diversity wasn’t just geographic; it spanned age groups, education levels and cultural backgrounds. This approach revealed critical failures that homogeneous internal teams consistently missed.Adaptive red teaming: AI red teaming must probe for behavioral vulnerabilities, including bias, toxicity, misinformation and manipulation, which differ from traditional penetration testing, which focuses on technical vulnerabilities. By taking a proactive approach and building specialized red teams with domain expertise, companies identify and patch vulnerabilities before a model is ever released.Continuous monitoring and bias detection: AI models don’t just fail; they can evolve and drift over time. Biases that aren’t present at launch can emerge as models encounter new data patterns or as societal contexts shift. Effective AI testing isn’t a one-time gate before deployment; it’s an ongoing monitoring system that tracks model behavior across different demographic segments and use cases.Don’t Wait for Your AI Failure MomentSome will argue that robust AI quality assurance is too expensive or too complex to implement rigorously. This is the same argument we heard about cybersecurity over a decade ago, before events like the Equifax and Target breaches.The difference with AI is that its failures can be far more damaging, affecting loan approvals, hiring decisions and medical diagnoses long before anyone notices.For development leaders, the path forward requires a shift in both technology and mindset. Start by expanding your definition of quality beyond just functional correctness to include fairness, safety and contextual appropriateness.Build testing teams that reflect the diversity of your user base, not just your engineering organization. Implement continuous monitoring that tracks model behavior over time, not just at deployment.Most importantly, recognize that AI testing is fundamentally a human challenge that requires human intelligence and expertise. While automated tools play a supporting role, the nuanced judgment needed to identify bias, toxicity and contextual failures demands human systematic knowledge.We can continue the current trajectory — rushing AI systems to production with minimal oversight — and wait for the inevitable cascade of failures to force a reckoning. Or, we can learn from history and incorporate quality assurance into our AI deployment strategies from the outset.The organizations that choose the latter path won’t just avoid the coming wave of AI failures; they’ll deliver AI experiences that truly create value for users while earning the trust that’s essential for long-term success.The question isn’t whether rigorous AI testing will become standard practice. The question is whether your organization will be ahead of that curve or a cautionary tale about what happens when you’re behind it.The post We’re Repeating Cybersecurity’s Big Mistake, This Time With AI appeared first on The New Stack.