2 June 2025
The Greeks were mighty. They were smart, too. They had vast numbers of troops, a fleet to be jealous of, a king who spoke directly to Zeus (albeit in his dreams!), and a hero who appeared to be unbeatable. The Trojans were tenacious. They knew they didn’t have the strength of the Greeks in battle, they simply didn’t have the capabilities. What they did have, however, was solid defence. The siege of Troy lasted for 10 years, and in all that time, the Trojans’ walls held firm.
But then one day…
It’s funny, we don’t like to admit that we would be the one to fall for that level of deception. It’s not human failing, but rather the natural evolution of psychology and animal nature. We want to trust because trusting when it’s right provides benefits. In every mandatory cybersecurity training you’ve done, you will have come across the concept of ‘social engineering’. In short, it really means manipulation of trust, exploitation of human psychology, impersonation and misdirection.
So the story goes, the Trojans succumbed to the Greeks’ trick, allowing the giant horse inside the city walls and not inspecting it thoroughly or raising suspicion. At night, while the Trojans celebrated their victory and then fell asleep, the Greeks hidden inside the giant horse slipped out and opened the gates…
What happened in Troy?
This had all the hallmarks of deceit. It had the appealing appearance of retreat, the Trojans’ desire for victory, the acceptance of what they thought was a legitimate gift. The Greeks pulled off the social engineering feat to end all social engineering feats - they got the Trojans to voluntarily breach their own security measures. The Trojans were human. They wanted to be trusting. Instead, they turned into their own worst enemies, and were the weakest link in the security system. By underestimating the hidden threat, they did the Greeks’ job for them.
So what does that mean for AI?
The walls of Troy found their weakest point not in brute force, but in trust and manipulation. Today, we are rapidly developing and deploying AI systems that learn from and act upon huge amounts of data. And we face similar vulnerabilities. Many of our AI models are exceptionally powerful (just like the walls of Troy), but are susceptible to manipulation and threats hidden within the data they rely upon. In the same way the Trojans unknowingly welcomed their own downfall, so can our AI systems be compromised if we fail to recognise that malicious or flawed data (especially if subtly introduced) can corrupt outcomes, create bias, and ultimately destroy trust. We are well aware of how to guard against attacks coming from the outside - pen testing, cyber training, system and infra design best practices. But we must also recognise that there are other, insidious threats on the inside, within the data itself. Flawed inputs, supply chain issues, shifts in data patterns. If we are too trusting, we will fail to address these data issues appropriately, and this may well become our modern Achilles’ heel.
Global Resolve: Forging a United Front on AI Data Security
Recently, in a significant international collaboration on AI regulation and governance, the US, UK, Australia, New Zealand and South Korea jointly published the Cybersecurity Information Sheet. This is a significant step towards demonstrating alignment on best practices and appropriateness of data security efforts, in order to protect (arguably) the most important piece of the whole AI puzzle: the data.
What is even more interesting is that each of the countries involved in creating the Information Sheet take different approaches to AI regulation and governance domestically.
The skeptic side of me says that actions speak much louder than words, and that looking good and doing good are not the same thing. But, the idealist side says that the information is good, that data security in AI is critical, and that, should the core principles in the publication actually be observed, then it can only be a good thing.
Taking it for exactly what it is, let’s explore how this publication can help us collectively resist the AI Trojan Horse.
What does the publication talk about?
The document talks about AI data security across the entire lifecycle of AI. It closely aligns with NIST’s AI Risk Management Framework in relation to ‘secure’ AI. Broadly, this means a strong cybersecurity framework to ensure systems are safe, data is stored securely, access is limited, and measures are in place to prevent unauthorised use. At the very start, the publication provides three main goals for readers:
What are NIST’s 6 lifecycle stages of AI?
NIST’s AI Risk Management Framework sets out the following 6 lifecycle stages:
What are the 3 major categories of risk described in the Information Sheet?
There are numerous (technical) threats identified in the paper. Broadly speaking, the AI data security Trojan Horse can sneak in through our gates in the following ways:
What best practices does the publication describe?
The collaborative paper sets out 10 best practices to observe to better protect the data used in the building of AI systems. This is how we stop the Trojan Horse:
1️⃣ Source reliable data and track provenance
This is about knowing where your data comes from, its history, and how it was obtained. Provenance is how we trace data lineage. If reliable data is the foundation of trustworthy AI, then it stands to reason that stopping the introduction of bad elements from the start is critical.
2️⃣ Verify and maintain data integrity during storage and transport
Data can be at rest in a database or in transit across networks. Either way, data integrity must be protected. Techniques like checksums and hashing help ensure that data hasn't been corrupted or deliberately altered en route.
3️⃣ Employ digital signatures to authenticate trusted data revisions
Digital signatures (like ink signatures on paper) confirm the origin of data. You can be confident that updates or changes to data come from a verified, trusted source. Your digital "seal of authenticity" should be on every piece of data to prevent unauthorised additions or alterations.
4️⃣ Leverage trusted infrastructure
Infrastructure (i.e. storing, processing, and transmitting data) must be secure. There are endless ways to do it, including secure cloud providers, robust hardware, and well-maintained network components.
5️⃣ Classify data and use access controls
Not all data is equally sensitive. Classifying data (e.g., public, internal, confidential, restricted) allows you to implement appropriate access controls. Only authorized personnel or systems should be able to view, modify, or use specific datasets, limiting the exposure of sensitive information and ensuring only trusted "guards" have access to critical data.
6️⃣ Encrypt data
Encryption scrambles data into an unreadable format, making it unintelligible to unauthorized parties. Whether data is stored or being transmitted, encryption provides a vital layer of protection. Even if a "Trojan Horse" manages to breach your defenses, encrypted data makes its payload useless without the decryption key.
7️⃣ Store data securely
Beyond encryption, secure storage involves physical security, robust backup strategies, and redundant systems. This ensures that data is always available and protected from loss, corruption, or theft, even in the face of disasters or direct attacks on storage infrastructure.
8️⃣ Leverage privacy-preserving techniques
As AI often deals with vast amounts of personal or sensitive information, techniques like differential privacy or federated learning allow models to be trained without directly exposing individual data points. This is crucial for building ethical AI and preventing the "Trojan Horse" from being a privacy breach waiting to happen.
9️⃣ Delete data securely
When data is no longer needed, it must be completely and irreversibly removed. Simply hitting 'delete' often isn't enough. Secure deletion methods ensure that sensitive information cannot be recovered later, preventing old data from becoming a future "Trojan Horse" payload if systems are decommissioned or repurposed.
🔟 Conduct ongoing data security risk assessments
The threat landscape is constantly evolving. Regular risk assessments are essential to identify new vulnerabilities, evaluate the effectiveness of existing controls, and adapt to emerging threats. This ensures that your defenses are continuously updated, preventing complacency and preparing for the next generation of "Trojan Horses."
Now what? How to protect your Troy!
This international collaborative effort provides clear insight into protecting AI systems, in part, via data security. It’s also clear that data security is not just a technical task, it’s also a strategic necessity. The Cyber Security Information Sheet uncovers some of the complexity of data security and robust protection measures. This is particularly true as we try to balance innovation and speed of deployment with responsible and trustworthy development. Let’s not be like the Trojans who learned too late hidden threats can lead to devastation.
There is some good news - you can begin to understand where you stand today. A great first step is to review your current AI governance structure to ensure you can define the scope of your AI initiatives, gain data-specific awareness, and create a framework to mitigate those risks in line with risk tolerance.
Building resilient and trustworthy AI systems, like securing an ancient city, is about consistent, informed action, not a single magic bullet. Download our FREE AI Data Security Governance Maturity Assessment to see where you stand today and what improvements you can make.
This content is informational only and not legal advice. GLF is not a law firm regulated by the SRA.
Get in touch to talk about AI governance, compliance and risk management solutions!