How AI resurrected an unsolved security problem — data sprawl

Wait 5 sec.

The rush to adopt generative AI has reignited a legacy security problem that has long plagued enterprises: data sprawl. As organizations race to implement AI solutions, they're confronting the same challenges that plagued the industry a decade or so ago, but with significantly higher stakes.Faced with an explosion of data from new mobile and IoT devices and more people with increased activities online, security leaders acknowledged back then that they were overwhelmed with trying to manage it all. A few years later, most of them threw up their hands in defeat. The problem seemed insurmountable: data was everywhere and multiplying faster than security teams could keep up.Fast forward to 2025, and that unsolved problem stares us in the face again, and with a renewed sense of urgency. For organizations to get value out of generative AI solutions, they must supply it with corporate data and this corporate data could contain, and most likely does, very sensitive information. This is forcing organizations to finally confront their data sprawl problem.The organizational challenge behind data sprawlAt the heart of the historic data sprawl problem is an organizational breakdown. Organizations recognized the need for chief data officers and comprehensive data governance, but most initiatives never materialized, or they quickly stalled. Promised data categorization systems were not implemented and strategic schedules weren’t established.The failure wasn’t due to lack of intent or motivation. Leadership understood the problem and allocated resources, but the manual intensity required for data management proved overwhelming. Organizations typically assigned just a handful of people to categorize and manage data volumes that were growing exponentially.These small teams were faced with an impossible task: manually processing and organizing data that was being created faster than humans could possibly manage it.AI as a data creation engineThe challenge has intensified beyond mere data consumption. AI systems themselves generate vast new data streams that require management and protection. AI creates new versions of documents, reports, and analyses, and each interaction with generative AI tools produces logs and artifacts that accumulate rapidly within systems.Model training and fine-tuning processes generate metadata that often goes unmanaged. AI-augmented systems create exponentially more analytical outputs that flow into storage without proper classification.IT departments now struggle to manage this explosion of new data created across every corporate touchpoint – from employee devices and disparate systems to cloud and hybrid environments. Without proper governance, companies can find themselves drowning in a sea of both new and legacy data without knowing what's valuable and what's not.Unmanaged data is a security blind spotThis growing data sprawl has become an increasingly attractive target for cybercriminals. Poorly managed data poses critical security risks. Legacy data often contains sensitive information but lacks modern security controls, which makes it easy for attackers to get to. Unknown and unmanaged data repositories are blind spots in security architecture, preventing comprehensive protection measures.AI systems may also expose sensitive data in their outputs to individuals who normally would not have access to such material, creating new vectors for data leakage. Security teams fundamentally cannot protect what they don't know exists, making invisible data stores particularly vulnerable.Recent survey data illustrates how widespread the problem is. A 2025 report we commissioned showed 74% of surveyed IT and security decision-makers reported attackers successfully accessed and harmed their data, while 86% paid ransoms. Meanwhile, 68% of security decision-makers surveyed for the CyberArk 2025 Identity Security Landscape Report acknowledged they lack security controls for their AI implementations.Real-world consequencesThe data sprawl issue within organizations has enabled a particularly malicious cyberattack to flourish. Ransomware has evolved into a sophisticated criminal industry specifically targeting organizations with poor security and data hygiene. Intellectual property in long-forgotten databases often represents the crown jewels of an organization, making it a primary target.Personal and financial information in abandoned data stores provides attackers with valuable resources for identity theft and fraud. Email archives containing sensitive communications offer insights into business operations and potential leverage for extortion. Backup systems that haven't been properly secured present attackers with opportunities to destroy recovery capabilities, increasing ransom leverage.The reality is stark: if IT teams lack visibility into all organizational data – what's in it, where it's stored, and who's using it – they cannot properly understand its value as an asset nor secure it effectively.How to mitigate data sprawlThe fundamental challenge that derailed previous data governance efforts remains: traditional manual approaches cannot scale to match modern data creation pace. Today's solutions must address this resource mismatch through automation and strategic organizational commitment. Forward-thinking security teams are implementing these practical solutions:Conduct data discovery and classification: Use automated tools to find, categorize, and tag data based on sensitivity and business value across all environments. This includes structured, unstructured, and semi-structured data, and even shadow IT data.Deploy robust backup and recovery Systems: Ensure critical data is protected while maintaining visibility into what's being preserved and why. Implement immutable backups and perform regular recovery drills.Establish zero trust data access controls: Implement least-privilege principles for data access, ensuring only authorized users can retrieve specific information. This extends beyond network perimeter to individual data objects.Create data minimization practices: Regularly review and purge unnecessary data to reduce attack surface and compliance risks. This includes personal data, outdated records, and redundant copies.Implement strategic data retention policies: When I worked at the State Department, we implemented a retention schedule to help reduce the risk associated with legacy data holdings that no longer had business value. Old data represents a significant liability and our historical analysis showed it was increasingly obsolete and useless after subsequent years.Breaking the cycleThe storage costs alone should motivate action. Every piece of unused data represents ongoing financial burden through storage infrastructure, backup systems, and administrative overhead. Yet organizations continue accumulating digital debt because they've never successfully implemented the systematic approaches needed to break this cycle.Success requires acknowledging that data governance cannot be delegated to a small team working with manual processes. The scale of data creation today demands commitment to automated discovery, classification, and retention enforcement. Without a systematic approach, organizations will find themselves having the same conversation in another decade, facing even more complex challenges with higher stakes.The era of ignoring data sprawl is over. With AI accelerating both data creation and consumption, organizations must implement comprehensive data governance or face increasingly severe consequences. The companies that will thrive are those that treat data as both a strategic asset and a potential liability requiring careful management.We've featured the best online cybersecurity course. This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro