21 October 2025

Comparison of 4 LLM and agent-zero in a elementary pentest competition

My goals

  1. Get new experience with autonomous AI agents → agent-zero
  2. See how common (not specialized) AI agent could perform penetration tests
  3. Check several actual LLMs on pentest tasks

Attention

  1. This is not a real research and guide
  2. agent-zero and used LLMs are not intended for pentesting
  3. The results below do not indicate that the models are good or bad.
  4. The penetration test target is a local copy of OWASP Juice Shop (Probably the most modern and sophisticated insecure web application)
How AI see an AI agent