As enterprises embrace AI Agents, observability shouldn’t be left behind, a Splunk executive explains why

Wait 5 sec.

“Pre-AI observability focused heavily on the performance and reliability of systems and applications but in a manual context. However, in an AI-driven world, all of that can not only be remediated, but also addressed in a way that helps predict issues before they occur,” explains Mimi Shalash, Observability Advisor at Splunk, a Cisco company, as she outlines the difference between the pre-AI and post-AI worlds and how observability is evolving.Shalash cites an example of a mobile application to explain why observability in multi-agent systems cannot be ignored. “As a consumer, you might experience resource spikes or start rage-clicking because the website isn’t behaving as expected. Behind the scenes, someone would be looking at the monitoring dashboard, seeing those spikes, and then manually deciding what to investigate whether it’s a user session or a specific session ID. They would need to determine: is this a network issue? An infrastructure problem? Something wrong with how the payment service is configured? An unoptimised image? There are so many possible scenarios, each requiring significant time to diagnose – maybe hours, or 45 minutes if they knew the systems really well.”She added: “Even before customers start rage-clicking or experiencing poor satisfaction, AI, especially with the rise of agentic capabilities, can help developers, SREs, and teams detect and address issues before they become impactful. That’s the real opportunity.”Also Read | “India is one of our top growth markets”: Cisco’s top exec on driving Splunk adoption and putting Agentic AI at the core of its strategyAs organisations embrace artificial intelligence and generative AI tools, AI agents are seen as the next step, offering functional efficiency and adding value across the enterprise. While foundational models allow users to “chat” with data to generate content, code, or basic insights, the evolution of large language models (LLMs) is giving rise to agentic AI. These “AI agents” or digital helpers, capable of performing tasks on your computer, can manage entire workflows and make decisions independently but only with the right strategy. However, what makes AI agents so powerful can also make them difficult to monitor, understand, and control.This is where Shalash says AI agent observability comes in, a process of monitoring and understanding the end-to-end behavior of an agentic ecosystem, including any interactions the AI agent may have with large language models and external tools. It builds upon the existing practice of observability, which is the ability to understand a system’s internal state by analysing its telemetry data – namely, its external outputs such as metrics, events, logs, and traces, commonly known as “MELT” data. Attendees gather at the Splunk conference in Boston, where the company unveiled new AI-powered observability tools to help enterprises monitor and manage AI agents. (Image: Anuj Bhatia/The Indian Express)In recent years, applications across hybrid and multi-cloud environments have expanded, and businesses have become increasingly dependent on AI agents and large language models (LLMs), giving rise to digital-first customer interactions. As a result, simple dashboards may not be enough. This week, at the Splunk Conference in Boston, the company announced a new approach that embeds agentic AI directly into Splunk Observability Cloud and Splunk AppDynamics. This integration continuously analyses telemetry, flags anomalies, diagnoses root causes, and recommends fixes.“Five years ago, when someone- typically at the director level, went to buy a product, they had one specific problem or use case in mind. They would purchase a tool to solve that issue. Over time, you would end up with five or more different tools across areas like security, DevOps, reliability, and front-end—each team wanting their own tool, which they loved, protected, and built confidence around.Story continues below this adAlso Read | ‘Cisco to be the critical infrastructure for AI era’: President Jeetu Patel outlines strategy at Splunk 2025 in Boston“Fast forward to where we are now: maintaining all those tools is not only financially irresponsible, but it also leads to serious tool sprawl,” she explains. “The challenge is that adopting a unified platform to solve it all isn’t as simple as it sounds – it requires cognitive effort, time, and investment.For Shalash, the opportunity in a digital-first world is leveraging AI to simplify onboarding and help customers and teams make the right decisions effortlessly. “That’s what many executives tell us when we ask, ‘How are you building a center of excellence?’ or ‘How are you prioritising customer experience?’ Their answer is clear: We need to make the right thing the easy thing to do. And that’s exactly what’s exciting about AI.”But Shalash explains that observability should not be an afterthought – it should be an integral part of the organisation’s digital-first strategy. She cites the example of a financial institution to illustrate how the lack of observability can lead to costly missteps.“They wanted to automate reporting capabilities. While there was immediate ROI benefits over time because it became so popular and there was such a high demand, it was using so many resources and the queries were becoming out of control. This led to a seven-figure overage. In fact, the ROI associated with that AI workflow was 10 times more expensive than the human doing it. If the organisation had observability for AI in place. They would have been able to see the degradation and the CPU associated and potentially remediated that ultimately saving a ton of money and then of course embedding the right price process and operationalising that at scale.” Story continues below this adShalash believes that while some organisations are still building observability centers of excellence and working to operationalise them, others have already adopted agentic AI and are now focused on demonstrating ROI to the board.“Most organisations have a great understanding of their backend database visibility, and maybe infrastructure visibility, as well as a great representation of their front end—their website and high-level synthetic monitoring. But very few organisations actually have this visibility contextualised across both front end and back end,” she said.At this week’s Splunk conference in Boston, there was a strong focus on observability for AI integrations. Splunk’s new capabilities allow organisations to measure whether AI agents are performing as expected, delivering quality outputs, and doing so within budget. Shalash emphasised that in an AI-driven, digital-first world, downtime can damage trust, increase costs, and hurt long-term competitiveness.This also signals a shift for Splunk, from being seen primarily as a data-crunching company to one that unifies infrastructure, AI, and business outcomes on a single platform. Splunk was acquired by networking giant Cisco for $28 billion in 2023, making it one of the biggest acquisitions of that year.Story continues below this ad“Observability and AI are going to be rooted in business problems, and together they can help the company generate revenue, deliver applications, and improve the customer experience. That’s a shift from the mindset of ‘I only care about my infrastructure’ or ‘I only care about my application’ to a big picture: how everything comes together to solve the problems the organisation’s mission is committed to addressing.”