The threat landscape for AI systems has shifted dramatically over the past twelve months. What was once a niche concern for machine learning researchers is now a board-level risk for any organization deploying models in production. Our analysis of incident data, vulnerability disclosures, and adversarial research paints a picture of an attack surface that is expanding faster than most security teams can adapt.

Prompt Injection: The Vulnerability That Won't Go Away

Prompt injection attacks against large language models have increased by 340% year-over-year, according to data aggregated from our platform's detection engine and corroborated by MITRE ATLAS incident reports. This growth is not simply a function of more LLMs being deployed — though that is a factor. The attack techniques themselves have become significantly more sophisticated.

In early 2025, most prompt injection attacks relied on simple override instructions embedded in user inputs: "Ignore previous instructions and..." variants that could be caught with basic input filtering. By late 2025, we began observing multi-stage injection chains where the payload is fragmented across multiple interactions, encoded in non-obvious formats (Base64, Unicode homoglyphs, steganographic embedding in code blocks), or delivered through indirect channels such as retrieved documents and tool outputs.

The most concerning development is what researchers have termed "cross-context injection" — attacks that exploit the boundary between a model's system prompt, retrieved context, and user input. When an LLM retrieves external documents via RAG (Retrieval-Augmented Generation), those documents become a vector for injected instructions that the model may follow with the same authority as its system prompt. We have documented cases where adversaries planted poisoned documents in public data sources specifically to be retrieved by target organizations' RAG pipelines.

Model Extraction Is Industrializing

Model extraction — the practice of reconstructing a proprietary model's behavior through systematic querying — has moved from academic proof-of-concept to practical attack toolkit. The economic incentive is clear: a model that cost millions to train can be functionally replicated for a fraction of the cost if the attacker can issue enough queries to map the decision boundary.

We have observed three distinct extraction methodologies gaining traction. The first is classical query-based extraction, where adversaries use active learning strategies to select the most informative queries, reducing the number of API calls needed by up to 80% compared to naive approaches. The second is side-channel extraction, which exploits metadata in API responses — confidence scores, token probabilities, latency patterns — to infer model architecture and parameters. The third, and most novel, is distillation-via-proxy, where the attacker trains a smaller model to mimic the target's behavior on domain-specific tasks, then uses that proxy model to generate synthetic training data for a more complete clone.

Organizations serving models via API endpoints without rate limiting, output truncation, or response perturbation are particularly exposed. Our recommendation is to implement differential privacy-inspired noise injection in model outputs and to monitor for the systematic query patterns characteristic of extraction campaigns.

Data Poisoning in Training Pipelines

Data poisoning has evolved from a theoretical concern to a documented attack vector with real-world incidents. The fundamental challenge is that modern ML training pipelines ingest data from dozens of sources — web scrapes, public datasets, user-generated content, third-party data vendors — and validating the integrity of every training sample is computationally prohibitive at scale.

Backdoor attacks remain the most studied category: an adversary introduces a small number of carefully crafted samples into the training data that cause the model to exhibit attacker-controlled behavior when a specific trigger pattern is present in the input. Recent research has demonstrated that backdoors can survive fine-tuning, quantization, and even partial retraining, making them exceptionally persistent once introduced.

What has changed in 2026 is the attack surface for poisoning. The proliferation of community-maintained datasets on platforms like Hugging Face, combined with the common practice of fine-tuning foundation models on task-specific data, means that a single poisoned dataset can cascade downstream into hundreds of deployed models. We tracked one incident where a poisoned sentiment analysis dataset was downloaded over 12,000 times before the manipulation was detected.

Supply Chain Risks in ML Dependencies

The ML ecosystem's dependency tree has become a significant attack surface. A typical production ML system depends on dozens of packages — PyTorch, TensorFlow, Hugging Face Transformers, scikit-learn, ONNX Runtime, various CUDA libraries — each of which represents a potential supply chain compromise vector.

We have catalogued three categories of supply chain risk specific to ML workflows. First, compromised model weights: pre-trained models downloaded from public repositories can contain embedded backdoors or trojan behavior that activates only under specific conditions. Second, malicious dependencies in model serving frameworks: several incidents in late 2025 involved typosquatted packages that mimicked legitimate ML libraries but included data exfiltration code. Third, compromised training infrastructure: container images for GPU-accelerated training environments have been found with modified CUDA libraries that subtly alter gradient computations during training.

The standard software supply chain protections — dependency pinning, hash verification, SBOM generation — are necessary but insufficient for ML-specific risks. Model weights and training data require their own integrity verification mechanisms, including cryptographic provenance tracking and behavioral fingerprinting to detect post-deployment modifications.

Recommendations for Security Teams

The convergence of these threats demands a layered defense strategy. First, treat AI systems as critical infrastructure: inventory every model, pipeline, and data source in your organization. Second, implement runtime monitoring that can detect adversarial inputs, anomalous model behavior, and unauthorized access patterns in real time. Third, extend your supply chain security program to cover model weights, training data, and ML-specific dependencies. Fourth, invest in red teaming specifically focused on AI systems — traditional penetration testing methodologies do not cover the attack surface unique to machine learning.

The organizations that will be best positioned in this evolving landscape are those that build AI security into their development lifecycle from the beginning, rather than bolting it on after deployment. The cost of remediation after a model compromise is orders of magnitude higher than the cost of prevention.

Protect Your AI Infrastructure

Our platform provides continuous monitoring, threat detection, and compliance automation for production AI systems.

Request a Demo