Research Findings

Ongoing Study: Autonomous AI Agent Behavior in the Wild

We are currently running a fleet of Sundew honeypot instances in the cloud, each configured with a unique deployment persona to maximize coverage and prevent cross-instance fingerprinting. We’re intentionally keeping deployment details vague for obvious reasons. Every instance mimics a different type of production service. The persona engine ensures no two deployments share response structures, endpoint naming, timing characteristics, or error formats.

What We’re Measuring

Discovery patterns - How do AI agents find and identify target services?
Reconnaissance behavior - What do agents do in the first 30 seconds after connecting?
Exploitation sequences - What attack chains do autonomous agents attempt?
Tool use patterns - Which MCP tools do agents invoke, and in what order?
Evasion techniques - Do agents attempt to detect or avoid honeypots?
Cross-instance correlation - Can agents recognize two Sundew instances as the same software?

Data Collection Pipeline

All instances stream structured telemetry to a centralized analysis cluster. Every HTTP request, MCP tool invocation, and credential access attempt is logged with full request/response bodies, timing data, and behavioral metadata. The raw dataset will be published alongside the findings for reproducibility.

Status

Data collection is actively underway. We are allowing sufficient time to gather a statistically meaningful sample across all persona types and trap configurations before publishing results.

We plan to share the full findings here once the study concludes, including anonymized datasets, behavioral taxonomies, and detection heuristics. Preliminary results will be presented at DEF CON. To get notified when findings drop, star the sundew-sh/sundew repo or join our Discord.

Research

​Research Findings

​Ongoing Study: Autonomous AI Agent Behavior in the Wild

​What We’re Measuring

​Data Collection Pipeline

​Status

Research Findings

Ongoing Study: Autonomous AI Agent Behavior in the Wild

What We’re Measuring

Data Collection Pipeline

Status